| 





Psychometrika 


VOLUME XXIII—1958 
JANUARY-DECEMBER 





Editorial Council 


Chairman:—HaroLp GULLIKSEN Managing Editor:— 
Lye V. JONEs 


Editors:—Dorotuy C. ADKINS Assistant Managing Editor:— 
Pau Horst B. J. WINER 


Editorial Board 


Dorotuy C. Apkins’ Wws. K. Estes Freperic M. Lorp 
R. L. ANDERSON Henry E. GARRETT Irvine LorRGE 

T. W. ANDERSON Lreo A. GoopMAN Quinn McNemar 

J. B. CARROLL Bert F. GREEN GerorcE A. MILLER 
H. 8. Conrapb J. P. GuitFrorp Wa. G. MoLiEnKopr 
C. H. Coomss HAROLD GULLIKSEN Lincoin E. Moses 
L. J. CRONBACH Pau Horst GrorGE E. NICHOLSON 
E. E. CurEtTon Auston S. HousEHoLtpER M. W. RIcHARDSON 
Pau. S. DwYErR Lioyp G. HUMPHREYS R. L. THORNDIKE 
ALLEN EpWARDs TRUMAN L. KELLEY LEDYARD TUCKER 
Max D. ENGELHART ALBERT K. Kurtz D. F. Voraw, JR. 





PUBLISHED QUARTERLY 


By THE PSYCHOMETRIC SOCIETY 
AT 1407 SHERWOOD AVENUE 
RICHMOND 5, VIRGINIA 








Psychometrika 


CONTENTS 


GENERAL RESOLUTION OF CORRELATION MATRICES INTO 
COMPONENTS AND ITS UTILIZATION IN MULTIPLE 
AND PARTIAL REGRESSION 


Joun A. CREAGER 


ERROR OF MEASUREMENT AND THE SENSITIVITY OF A 
TEST OF SIGNIFICANCE 


J. P. Surcuirre 


DETERMINATION OF PARAMETERS OF A FUNCTIONAL 
RELATION BY FACTOR ANALYSIS 


Lepyarp R Tucker 


THE INCLUSION OF RESPONSE TIMES WITHIN A STO- 
CHASTIC DESCRIPTION OF THE LEARNING BE- 
HAVIOR OF INDIVIDUAL SUBJECTS 

R. J. AUDLEY 


DETERMINING THE DEGREE OF INCONSISTENCY IN A SET 
OF PAIRED COMPARISONS 
Harowup B. GERARD AND HARowp N. SHAPTRO 


PROPERTIES OF THE ITEM SCORE MATRIX 
Aneus G. MacLean 


THE COUNSELING ASSIGNMENT PROBLEM 
Joe H. Warp, JR. 


A RETEST METHOD OF STUDYING PARTIAL KNOWLEDGE 
AND OTHER FACTORS INFLUENCING ITEM RE- 








VeRA T. BROWNLESS AND JOHN A. KEAts 


THE MEASUREMENT OF FUNCTION FLUCTUATION 
R. F. GARsIDE 


PREDETERMINATION OF TEST WEIGHTS 
Paut J. HorrMan 


RULES FOR PREPARATION OF MANUSCRIPTS FOR PSYCHO- 
METRIKA 








VOLUME TWENTY-THREE MARCH 1958 NUMBER 1 











PSYCHOMETRIC MONOGRAPHS 
The issues of this series are: 


Tuurstong, L. L. Primary mental abilities. 

Psychometric Monograph No. 1, $3.00. (Second impression, cloth 
binding.) 

TuHurstone, L. L. AND THURSTONE, THELMA Gwinn. Factorial studies 
of intelligence. 

Psychometric Monograph No. 2, (out of print). 

Wo r eg, Daet. Factor analysis to 1940. 

Psychometric Monograph No. 3, (out of print). 


Tuurstong, L. L. A factorial study of perception. 
Psychometric Monograph No. 4, (out of print). 


FRENCH, JoHN W. The description of aptitude and achievement tests 
in terms of rotated factors. 
Psychometric Monograph No. 5, $4.00. 


Decan, JAMES W. Dimensions of functional psychosis. 
Psychometric Monograph No. 6, $1.50. 


Lorp, FrepeEric. A theory of test scores. 
Psychometric Monograph No. 7, $2.00. 


Rorr, Merrit. A factorial study of tests in the perceptual area. 
Psychometric Monograph No. 8, $1.50. 


Orders for Psychometric Monograph No. 1 should be sent to The Uni- 
versity of Chicago Press, 5750 8. Ellis Avenue, Chicago 37, Illinois. Orders 
for No. 5 through No. 8 should be sent to The William Byrd Press, Box 2-w, 
Richmond 5, Virginia. 














PSYCHOMETRIKA—VOL. 23, NO. 1 
MARCH, 1958 


GENERAL RESOLUTION OF CORRELATION MATRICES INTO 
COMPONENTS AND ITS UTILIZATION IN MULTIPLE AND 
PARTIAL REGRESSION* 


Joun A. CREAGER 
AIR FORCE PERSONNEL AND TRAINING RESEARCH CENTER 


The derivation of multiple and partial regression statistics from 
uniqueness-augmented factor loadings, presented in the literature for 
orthogonal factor solutions, is generalized to oblique solutions. Amathematical 
rationale for the general case, without restriction to uncorrelated factors, 
is presented. Use of the general formulation is illustrated with a two-factor, 
seven-variable example. 


The considerable amount of computational time and labor required to 
compute multiple and partial correlation statistics when dealing with large 
test batteries is largely due to the necessity of computing the inverse of an 
nth order correlation matrix when classical procedures are used. Compu- 
tation of multiple regression statistics from factor statistics permits con- 
siderable reduction in time and labor, especially when the number of variables 
is large and the number of factors is small [1, 3, 4]. Once the factorial reduc- 
tion of the correlation matrix has been effected, any or all of the multiple 
and partial correlations or regression weights may be obtained. Furthermore, 
the factor solution may be studied to determine which predictors are most 
likely, when combined, to yield high prediction of a given variable. 

The mathematical foundations and computational techniques for 
obtaining multiple and partial regression statistics have been presented for 
orthogonal factor solutions by Guttman [3], Guttman and Cohen [4], Dwyer 
[1], and Horst [5]. Some of the saving in computational effort is lost by the 
preliminary factor analysis, especially if the centroid method is used with 
computation of residuals after extracting each factor. Dwyer [2] has pre- 
sented an example in which preliminary factoring was done using the square 
root or diagonal method. The multiple-group method, however, permits 
the extraction of several factors simultaneously and is therefore highly 
efficient. Since the multiple-group method will, in general, result in correlated 
factors, the solution must either be orthogonalized, which requires appreciable 
additional computation, or oblique factor statistics must be used directly 
to obtain the multiple and partial regression statistics. 

It is the purpose of this paper to present the mathematical rationale, 

*This report is based on work done under ARDC Project 7702, in support of the 
research and development program of the Air Force Personnel and Training Research 


Center, Lackland Air Force Base, Texas. Permission is granted for reproduction, translation, 
publication, use, and disposal in whole and in part by or for the United States Government. 


1 











2 PSYCHOMETRIKA 


and to demonstrate, by an illustrative example, the computational schemes 
for obtaining multiple and partial regression statistics from oblique factor 
solutions. 


Fundamental Relations 


Let R be ann X n correlation matrix of n variables with unit diagonals. 
Let R be factored, without restriction to uncorrelated factors, into r common 
factors and n unique factors, yielding 

(t) a factor structure matrix, S, of order n X r, 

(iz) a factor intercorrelation matrix, ¢, of order r X 7, 

(iit) a factor pattern matrix, P, of order n X r obtained from P = S¢™', 

(iv) a diagonal matrix, U, of order n, giving the unique factor loadings. 
Then 
(1) R = SP’ + U’. 

Formula (1) states the fundamental factor theorem in general terms, 
where resolution of a correlation matrix is made into common factors, either 
correlated or uncorrelated, and unique factors which are uncorrelated either 
inter se or with the common factors. 

In the subsequent development it is assumed that matrices R and U’ 
are nonsingular. Let V = U~’, and define B = VS and C’ = P’V, the unique- 
ness-augmented structure and pattern, respectively. Also let 


(2) Q=I1+P’V’S, 
where Q is a Gramian matrix of order and rank r. 
The Inverse of the Intercorrelation Matrix 


The inverse of an intercorrelation matrix, R~*, may be expressed in 
terms of oblique factor statistics. Starting with (1) and premultiplying 
both sides by P’V’ gives 


(3) P'V’R = P'V’SP’ + P’ = (P’'VS +1) P’ = QP’. 
Postmultiplying by R~* gives 

(4) P'V? = QP'R™, 

and therefore 

(5) Q°P'V? = PR". 


Premultiplying both sides of (5) by S, the factor structure, and adding 
U°R™ gives 
(6) SQUP'V? + UR = I. 
Subtracting SQ-’P’V’ from both sides and dividing by U’ yields 
(7) R” = V°I — SQ'P’V’) = V? — VBQ"'C'Y. 











JOHN A. CREAGER 3 


Use of (7) requires Q~* which is of order r compared to R~™ which is order n. 


Obtaining Regression Statistics 


Standard regression weights to be applied to predictor variables in the 
multiple regression of a given criterion may be obtained in either of two 
ways. If partial correlation statistics are not required, the Q matrix may 
be developed by (2) using uniqueness-augmented factor statistics for the 
predictors only. Let this matrix be designated as Q; , where j refers to the 
omitted criterion variable. If Q; is used in (7), the inverse of the predictor 
intercorrelation matrix will be obtained. The desired regression weights 
may then be obtained by 


(8) B = R's, , 


where r, is a column vector of validity coefficients of order n X 1, and 8 
is a column vector of the desired weights. The multiple correlation coefficient 
for the set of predictors and the given criterion is given by 


(9) Rj = pr’. 
If regression weights are not required, the multiple correlation coefficient 


may be obtained directly from R™* by 


_Rii=1. 


(92) By = FHS 


If partial correlations are desired, the inverse of the total correlation matrix, 
including the criterion validities, is required. In such a situation the regression 
weights and multiple correlation may be obtained from the Q matrix de- 
veloped from the entire set of variables. The inverse, R~’, is computed from 
the Q matrix as indicated by (7), the regression weights are then obtained by 


(10) B= —D"R", 


where D is a diagonal matrix derived from the diagonal elements of R™’. 
The multiple correlation coefficients may then be computed as before by 
(9). Partial correlations holding constant n — 2 variables may be obtained by 
(11) RB jn-ta-2) _ ~—~D “Rp. 


The Prediction of Factor Scores 


The matrix of regression weights for predicting common factors from 
tests, W. , is obtained from postmultiplying the inverse of the predictor 
intercorrelations by factor “‘validities’ (the common factor structure), 


(12) W.=R"S = VXI — SQ"P’'V)S = V?S — VBQ"C’VS. 








4 PSYCHOMETRIKA 


Similarly, the matrix of regression weights for predicting unique factor 
scores, W,, , is 


(13) W. = RU = [V’ — VBQ'C'V] U = V — VBQ''C’. 


The corresponding squared multiple correlation coefficients may then be 
obtained as the product sum of regression weights and validities. 

In a situation in which only the multiple correlation coefficient for 
predicting a common factor from test scores is desired, and the regression 
weights are not needed for a prediction equation, it may be obtained very 
readily without computation of R~* or the regression weights. The multiple 
correlation coefficient for a common factor from tests and the remaining 
common factors is equal to that from tests alone, since all of the common 
variance is in the test battery and adding the common variance to the battery 
will not change its predictive power. Guttman [3] and Dwyer [1] have shown 
that the multiple correlation coefficient for predicting a common factor from 
remaining factors and tests, for the orthogonal case, is 





(14) R,= /l—- — Sees = ae. 
45, aft 28 


j= 


A similar development for oblique factors yields 





> B,C, 
(15) R, = jah , 
1 = 2 B,C}; 


j=1 





Computational Techniques 


To illustrate computational techniques for the application of the 
principles developed above, the seven-variable, two-factor example used by 
Dwyer [1] is convenient, although the saving in computational effort becomes 
more convincing as the number of tests increases more rapidly than the 
number of factors. The correlation matrix is given in Table 1 with exact 
communalities in the diagonal cells. This matrix was factored by the multiple- 
group method, the summations being made over variables 1, 2, and 7 for 
factor I, and over variables 3 and 4 for factor II. The resulting factorial 
statistics are shown in Table 2. In usual applications where exact communali- 
ties are not known, it is necessary to use estimates [7]. 

In a practical situation it is necessary to judge the rank of R and to 
test this judgment by examination of the residuals. If r is underestimated, 
appreciable residuals will remain; if r is overestimated, some of the saving 
in computational labor will be lost. It is essential that residuals be negligible 
before proceeding with computation of regression statistics. Otherwise the 











JOHN A. CREAGER 5 


TABLE 1 


The Reduced Correlation Matrix* 











Test 1 2 2 = 5 6 cm 
z 450 580 280 010 360 380 610 
zZ 580 760 -280 100 520 440 780 
3 -280 -280 700 560 140 -560 -420 
4 010 100 560 610 400 -340 -030 
5 360 520 140 409 540 080 1,60 
6 360 440 -560 -340 080 520 540 
7 610 730 -120 -030 40 540 830 





*Decinal points have been omitted. 


latter will be approximated to a degree dependent upon the magnitude of 
residuals. The multiple correlation obtained under these conditions will 
generally be high by an amount approximately equal to the average of the 
absolute residual error [1]. 

Once the factorial reduction of R has been accomplished and the rth 
residuals checked for an indication of the completeness of extraction, the 
diagonal matrices V? and V are computed by taking the reciprocals of U” 
and U, respectively. Each row of the factor structure and pattern is then 
multiplied by v;; to obtain the uniqueness-augmented structure, B, and the 
uniqueness-augmented pattern, C. These are shown in Table 3. 

The next step is forming the matrix Q. This is done by summing unique- 
ness-augmented, structure-pattern cross products as follows: 


1 + Dy Bilin eae » Bini ayes >, Bi Ci 


(1) Qa] 2 BxCm 








LP Bat 1+ > B,.¢,, 


The summations are performed across tests, including the criterion variable, 
and across whatever predictor variables one may wish to include in the 
prediction. The Q matrix for all seven variables is shown for the illustrative 
example in Table 3. It is important to remember to add unity to the cross- 
product summations for the diagonal values of Q. 

_ Table 4 shows the methods outlined for predicting variable 1. Matrices 
B and C’ were obtained from Table 3 and matrix Q™* by inversion of the Q 
matrix (involving all seven variables) in Table 3. The subsequent operations 
are also illustrated using variable 1 as the criterion variable. It is seen that, 
in the usual practical situation, only single rows of the subsequent matrices 
need to be computed. Hence, only the first row of each of the subsequent 







































































TTEE"O §=— 6 190°0 ss SEBOTO = «9TROP0 S890 LO ee 

(T mox) Wy 
T769L°0 = wh 96t6s°o = ft Le6z°t = “ N €689°t = Tp 
6655°0-  960T°0-—sOTTT*O- = S9f0%O- ~—BLOT*O Teec*o> = €689°T 

Tren) 
Molz*e = ENG"T—eBBGTT ss ebST*Z GT ONNZ resl’e = eBTB"T 

t T mor } A AA 
zUT°O §~=—s«€950"0 = «GOLO"0-—(is«TOXO = BENNO =—s«CTT*. «S(O 

(T MOx), O7-0a 
TU6T°O- —6TB9"O- = BeHL*O. «= (9Be"T ~=—«(OOBET lzetro TRO" 
L9stez = LT19°0— ss SEBE“O«-s«OSE*O «= TOO = sCTSTST «= s«£668°O I 
oe rane 9 Pass 028 nit, fe ies ee ze 7 

A) 
‘ 
S10°0- = 6BT*O Z00L*0- ss toztz L 
zeet*o- ~—«L6S0°0 "Ees"0- L080 9 
oset*o Lt80°0 90TS*0 o808*0 $ 
s€zz"0 o're0*0 6T0z"T rzS0°0 " 
Ltrre*0 oc0*0- BSLN°T ste’ sé 
6€z0°0 ELST*O €stet*o =» cogTo*O. «TI BSEz"O- 6 9L"T z 
9800°0- -98L0°O Lostovo =. COT6O"O sd Seez*o- —€706"0 T 
It I It I It I 
qu ~~? 





SOFASTIVIS UOTSsetFey IOJ suCTeyndmoD OTIqeH 


9 TEV 




















€8t9°S 696°0- «II €€z9°S iss6°o- = I 
OeTL*z- T9eS°0T «=Csd Oese*e— e66e*TL =i 
Tr I pai ia 
T 5 
TI6T°O- LOST°Z z00L°0- STO%°*z Sere JeBses L 
6189°0> LT79°0 1EEs*O~ 4e08°0 TETMT NE8O°% 9 
8e7l°O $€86°0 80TS*0 0808 °O STLT°T SILT°Z% $ 
098e°T 09s€*o 6T0e°T @S0°0 €T09°T 29S °% % 
OoBEe*T T907°0- BSLT°T STEL°O- 8See°T Seecere € 
L26T°O~ TST8°T 8S€z°O- $69L°T z70°2 $99T°? z 
€TZ0°0- €668°0 S€ez*o~ €706°0 TetTe°T cets*T T 
IIf If, mf, If, tt, aan “= 
a 
SITFISTISIS JOR] ey} Jo UOT Ze WOU ibyun 
€ TaVL 
* 7-6 pure f UT yeoxe poy7Tmo useq eavy squrod Teupoedy 

06S0°T oose* «II 0000°T T9€z*- I 

oosz* o6so°tT iI T9€e°= oo0o°tT sa 

Tr I Tr I 

1 Dp 

OOLT oots 88L0- 2688 L88e~ 8106 Z 
o08'7 oes TeL4~ gtr LLS= T9SS 9 
0097 007s 8€0S 0l99 "9TE os?s S 
006 OoT9 TEos cee 90S2 Lzto % 
OO0E 0002 BSSL Ween €808 800%= € 
oor 0092 ‘160 7688 $STI~ 6998 z 
00S 00S* 8STO~ 6999 veLT~ 90L9 T 
zn ee T Ir — or” 


2 
ssouenbtun “AyTTeunumoy 


d 





#SOTISTIEIS AOQOVT OUL 


& THVL 





10.5861 2. 


0.9694 


2.3520 Rs 


1163993 
um ~0.9887 


i 


526183 


Ir 


526233 








JOHN A. CREAGER vf 


matrices is shown in Table 4, row 1 of VV’ was obtained by multiplying each 
v;; by 1.3484 (v,, from Table 3). The first row of R~’ is then obtained multi- 
plying each element of the row of BQ™'C’ by the corresponding element of 
the same row of VV’ and reversing the sign. The jth element (i.e., the diagonal 
element) of the row (in this case, the first cell entry) is then adjusted by 
adding v3; from Table 3. Thus the 1.6893 in the first cell of R~* in Table 4 
was obtained by multiplying 0.0709 K (— 1.8182) = —0.1289 and adding 
1.8182. 

The regression coefficients for predicting variable 1 are then obtained 
by multiplying each element in the row of R™* by —1/d; , where d; is the 
diagonal element of R™’. 

The inverse of R may be checked by recalling that RR™* = I. In the 
present example the first row of R multiplied by the first row of R™' gives 
0.9997, and the second row of R multiplied by the first row of R™* gives 
— 0.0004. It is, of course, the complete correlation matrix and its inverse 
that is involved here, rather than the reduced matrix shown in Table 1. 

The square of the multiple correlation of variable 1 in the other six 
variables is obtained by multiplying the first row of the 8 matrix by the 
first row of R, omitting R,, . This gives R?.,...., = 0.408182 and Ry 234567 = 
.6389. Use of formula (9a) gives Rj......, = 0.408039 and R,. 234567 = .6388, 
the value obtained by Dwyer. 8,2 is 0.3881 X 0.59196 = .2297. 

To obtain the partial correlation between variables 1 and 2 holding 
constant the remaining five variables, the diagonal element of the second 
row of R™* is required. The corresponding element of BQ™'C’ is (0.1573) 
(1.8151) + (0.0239) X (0.1927) = 0.2901; v;; = 4.1665. The negative product 
of these is — 1.2087, and d, = 2.9578. The partial correlation coefficient is 
then obtained from the (1, 2) cell of R™’. 


—1/Vd,-1/Vd, = —0.3881 X —0.7694 X 0.5815 = 0.1736. 


By similar operations, applying (12) and (13), regression statistics for the 
prediction of factor scores may be obtained. 


Discussion 


The methods of regression analysis from uniqueness-augmented factor 
statistics given by Dwyer [1] are formulated in terms of determinants. 
Generalization of Dwyer’s method is possible for the oblique factor statistics. 
Both Dwyer’s method and the one presented here in matrix terms are readily 
adapted to machine methods of statistics. By having either method in terms 
of oblique factor statistics, multiple-group extraction methods may be used 
to minimize residual computations without requiring orthogonalization of the 
factor matrices. 

These techniques are useful when it is desired to obtain: (2) the regression 
of each variable on the n — 1 remaining variables; (77) the partial regression 








8 PSYCHOMETRIKA 


of each pair of variables, holding constant the remaining n — 2 variables; 
(z7z) the regression weights for the prediction of test scores; (zv) the regression 
weights for the prediction of factor scores. They can also be used to set up 
standard procedures for routine treatment of batteries by machine methods 


of statistics. 


REFERENCES 


{1] Dwyer, P. S. The evaluation of multiple and partial correlation coefficients from the 
factorial matrix. Psychometrika, 1940, 5, 211-232. 

{2] Dwyer, P. S. The relative efficacy and economy of various test selection methods. 
PRS Report 957, AGO. 12 June 1952. 

{3] Guttman, L. Multiple rectilinear prediction and the resolution into components. 
Psychometrika, 1940, 5, 75-99. 

(4) Guttman, L. and Cohen, J. Multiple rectilinear prediction and the resolution into com- 
ponents: II. Psychometrika, 1943, 8, 169-183. 

{5] Horst. P. (Ed.) The prediction of personal adjustment. SSRC Bull., 1941, 48, pp. 
437ff. 

[6] Thorndike, R. L. Personnel selection. New York: Wiley, 1949. 

[7] Thurstone, L. L. Multiple-factor analysis. Chicago: Univ. Chicago Press, 1947. 


Manuscript received 7/16/56 
Revised manuscript received 5/23/57 











PSYCHOMETRIKA—VOL, 23, NO. 1 
MARCH, 1958 


ERROR OF MEASUREMENT AND THE SENSITIVITY OF 
A TEST OF SIGNIFICANCE 


J. P. Surcutrre* 
UNIVERSITY OF SYDNEY 


Implications of random error of measurement for the sensitivity of 
the F test of differences between means are elaborated. By considering the 
mathematical models appropriate to design situations involving true and 
fallible measures, it is shown how measurement error decreases the sensitivity 
of a test of significance. A method of reducing such loss of sensitivity is 
described and recommended for general practice. 


In the statistical theory of sampling, explicit attention is given to 
sampling error, which refers to fluctuations in the composition of samples 
drawn at random from a defined universe. A second form of error, largely 
ignored in this context, is measurement error. This applies to the individual 
sampling units and is thus related to the definition of the universe rather 
than sampling outcomes. Applications of sampling theory have proceeded 
on the implicit assumption that the sampling units which make up the 
defined universe are error free, that (in psychometric terms) the universe 
consists of true scores. This assumption is not justified in practice, where 
measurement is seldom free from error. Parameters, such as the mean and 
the variance, of a universe of fallible scores will differ from those of a universe 
of true scores; tests of significance of a given effect will not necessarily be the 
same in the two cases. This paper elaborates the implications of measurement 
error for the simple case of the F test of difference between means. By setting 
up the mathematical models appropriate to the relevant design situations, 
it is shown how measurement error (relative to the parallel true score case) 
decreases the sensitivity of the test of significance. Sensitivity refers to the 
likelihood of detecting a nonzero population effect at a given level of signifi- 
cance. Through its inverse, proneness to Type II error, it is usually expressed . 
quantitatively as power. A method of reducing such loss of sensitivity is 
described. 


Definition of Universes of Scores 


The scale or range of application of a measuring instrument comprises 
a number of units of measurement. Let w represent any one unit or subrange 
of the scale and v any one occasion of measurement. Errors of measurement 


*I wish to express my thanks in acknowledgement that the present form of this a ond 
has benefited from editorial comment, and from the advice of Dr. H. Mulhall of the De- 
partment of Mathematics, University of Sydney. 


9 











10 PSYCHOMETRIKA 


constant for all units of the scale on all occasions of testing will be designated 
f; errors constant for all occasions of measurement with a particular unit, 
but variable from unit to unit will be designated g,, ; errors variable from 
occasion to occasion and from unit to unit will be designated h,,, . For example, 
a carpenter’s tape may be incorrectly calibrated uniformly over the whole 
scale; then unevenly stretched over the first few feet which are most commonly 
used; and finally subject to random error on any given application. For this 
case the total error of measurement HL = f + g,, + h,, . Analogous errors of 
measurement occur with psychological tests [3], but these will not be dis- 
cussed here; while knowledge of the source of error can facilitate its control, 
it is rather the mode of operation of error which is relevant to the statistical 
argument. 

Most generally, an obtained fallible measure or score, X, , can be ex- 
pressed as the sum of the true score, 7’, , and its error of measurement, 
E, [3]. This holds whether measurement error is unitary, or complex in the 
sense illustrated above. The additive relationship also holds whatever other 
relationship may be shown to obtain between true score and error for a 
universe of obtained scores. For instance, while E? may enter as a multiplier 
in the relationship between obtained and true score, X, = E/T, , X, may 
also be written X, = 7, + HL, , where FE, = (EF — 1)7, . Other assumptions 
about the nature of error and its relationship to true score are tenable, but 
the additive assumption is adopted here because it simplifies the subsequent 
analysis. 

The mean and variance of an infinite universe of fallible scores X, = 
T, + E, may be obtained as follows: 


N N 
Mean = lim [> X./N] = lim [Di (7, + E)/N] = P+ E. 
N--o Noo 


N N 
Variance = lim [>> 27/N] = lim [>> (¢, + e,)?/N] 
N-o Noo 
2 2 
=o,tot+ 2P1eF10~ . 
These outcomes are summarized in Table 1. Depending upon the mode of 
operation of error, cases may arise where any or all of L, a? , and p,, are zero, 


TABLE 1 


Parameters of Universes of True, Error and Obtained Scores 





Variance 











oe ae ae 








J. P. SUTCLIFFE 11 


in which cases one or more of the parameters will be common to the universes 
of true and obtained scores. 

When error is absent, the mean = T and variance = «7 (Case 1). When 
error is constant # = f > 0, o2 = 0, p,. = 0; hence the mean of fallible 
scores = 7 + f, and variance = o7 (Case 2). Where error is variable its 
distribution may be either random or nonrandom. (In either case, the 
variances of error about different true score values may be homogeneous 
or heterogeneous. Heterogeneity of variance permits nonzero correlation 
between true scores and the variance of errors about them, but, as in random 
sampling, this correlation is independent of p,.. Heterogeneity of error variance 
should, of course, be taken into account in any analysis of variance [2].) 
If errors occur at random about 7’, , then E = 0, o? > 0, and p,, = 0; hence 
mean = 7’, and variance = o? + oa? (Case 3). If errors are randomly dis- 
tributed about 7, + f, E = f + 0, 02 > 0, p.. = 0; hence mean = T + f, 
and variance = «7; + o% (Case 4). Where errors are distributed randomly 
about 7, + g, , then # = 9 + 0,02 > 0, p,. > 0, and hence mean = T + 4@, 
and variance = of + o2 + 2p,.0,0, (Case 5). With nonrandom distribution 
of errors, generally one would find L > 0, o? > 0, and p,, > 0. Whether 
errors are distributed about 7’, , T, + f, or 7, + 9. , mean = T + error, 
and variance = of + of + 2p,.0,0, . All cases of nonrandom distribution 
of error are here referred to as Case 6. 

The six cases are summarized in Table 2 to enable comparison of the 








TASLE 2 
Parameters of Universes of True ani Fallible Scores 

Case Mean Variance 

1 T i 

2 Tat au? 

3 ; 2 2 

E t Fe 

4 Te f o,* + 0,4 

s Teg a? ’ o,” + 204% Fe 


3 T + error Opts) Ga? 6 Spc doe 





parameters of fallible score universes with those of the true score universe. 
In no case are both parameters the same as those in Case 1; however, Case 2 
has the same variance, and Case 3 the same mean. Cases 1 and 2 are unlikely 
to occur in practice. Most experiments aim to achieve the conditions of Case 
3, but the intrusion of constant errors, scale biases, and other nonrandom 
errors makes Cases 4, 5, and 6 quite common. The following discussion will 
center on Cases 1 and 38, with incidental comment on the others. 











12 PSYCHOMETRIKA 


Comparison of the Design Models 


With the universes of true and fallible scores defined, it becomes possible 
to compare the sensitivity of tests of significance applied in given cases. For 
comparative purposes the analysis of variance for Case 1 will be described. 
Then two analyses for Case 3 will be considered—the first reflecting common 
practice, the second involving random replication of measurement to increase 


reliability and hence sensitivity. 


Notation and plan for Case 1 

Consider the comparison of means of independent random samples of 
true scores obtained at different levels of a single-treatment classification. 
Let 2 = 1, 2, , @ represent any one of the treatment levels within the 
treatment classification A. Let 7 = 1, 2, --- , b represent any one subject 
in a sample of subjects B. Then X;,; is the true score of the subject j in the 
treatment level or group 7. As subjects are randomly sampled, j represents 
number only, not rank within a group. Let a dot in place of a subscript 
represent summation across the class indicated by the subscript replaced, e.g., 


a b 
hy = 2, 2 es * es 


7=1 t=1 7=1 


The sample values of X,; and the sums are represented in Table 3. 














TABLE 3 
Plan of Obtained Scores of Subjects Within Random 
ocated to Independent Treatment Groups 
B Subject 
2 j ) Sun 
Xi, * gt Xp, % 
x X X2; Xo, X2 
l Xi2 x j Xib Xi 
: x Xay Xa, 
x 





Analysis of variance for Case 1 
The total variance of the ab sample values of X,; can be expressed in 
terms of two sources of variation: between treatments, A, and between sub- 


jects within treatment levels, B, . A given deviation score may be written as 


x,, = (X,,—-X.) = (%,. — X..) + (Xi, — X,). 











J. P. SUTCLIFFE 13 


The total sum of squares is 


a b a os 2 a b ie é 
SssP= > Ox, -¥%.)2=b Dk. -— ¥.9°+ DO DY (Ku — ¥,). 
i=1 j=1 t=1 #=1 j=1 


The degrees of freedom pertaining to these components are Total = (ab — 1), 
A = (a — 1), B, = a(b — 1). From the SS and df, the mean squares, S’, 
may be obtained as unbiased estimates (on the null hypothesis) of a common 
population variance. 


Expectation of mean squares for Case 1 


To determine what is estimated by a given S’, one takes the expectation 
according to the model involved. As Case 1 involves a universe of true scores, 
Model 1 can be written as 


Xi; = A; + B;; . 


A; is the class of treatment parameters of which the sampled treatment 
means are estimators. The distribution of A; will vary according as treat- 
ments are fixed constants or randomly sampled. For convenience the case 
of random A; with variance a will be considered. B;,; is the class of true score 
deviations from A; , which are normally distributed with zero mean and 
variance o7 . To find the expected values of SS and then S’, one substitutes 
model values in the analysis of sample variance and thereby determines the 
limiting value of a given component. 


(i) Expectation of Sx 


(X;. -— 2 (A; - A) + (B;. = BJ; 


zt >a (X,;. — xy} = xt = (A; - A)? +b yi (B,. — 5. 


t=1 


I 


= D(a — 1l)oi + D(a — 1)07/d. 
Thus Sj = b >> (X;. — X.)’/(a — 1) > bof + 07. 
i=1 
(it) Expectation of Si, 
(Xi; = Zz.) = (B;; — B;), 


and 


a b 
Si. = DL (Xi; — ¥;.)’/alb — I) a. 











14 PSYCHOMETRIKA 


TABLE 4 


Analysis of Variance for Model 1: 
Single Treatment Classification Design with 


b Randomly Sampled Subjects for Each of a Levels (True Scores) 








Number Source Sum of Squares df s2 Expectation of s2 
a e e 9 2 
1 A PEM, -x 4 (a-1) Sq bog? + o,? 
ab - 2 
2 3 within A EE (Xi; - x)? a(b-1) SB, ae 
a - 
3 Total SE (X..- xX )2 (ab - 1) 





These outcomes for the analysis of variance are summarized in Table 4. 
On the null hypothesis cf = 0. One rejects the null hypothesis if the ratio 
F, = Si/S;, with df, = (a — 1) and df, = a(b — 1) exceeds F, , the tabled 
value for the chosen level of significance. 


Case 3 


It is common practice in psychological experimentation to use a design 
superficially similar to the one just described. That is, one has a series of 
random samples of subjects allocated to treatment levels and for each subject 
one has a single score. If, as is usually the case, the scores are fallible, then 
Model 1 is inapplicable and instead one must write the model to include 
error of measurement. Assuming that the scores have been drawn from a Case 
3 universe, there will be two designs according as one has or has not random 
replication of measurement on a given subject. For common practice, which 
provides no measurement replication, Model 3a is 


Xi =A +B; +7; - 


A, and B;; have been defined above; I;,; is the random error of measurement 
component, normally distributed with zero mean and variance o% . The 
summary of the analysis of variance for Model 3a is given in Table 5. For 
the test of significance, the null hypothesis is of = 0. One rejects the null 
hypothesis if the ratio F;, = Si/S%, with df; = (a — 1) and df, = a(b — 1) 
exceeds the tabled value of F for the chosen level of significance. 

One may note that the terms o{ and o7 are common to the expectations 
of Si for Models 1 and 3a. In addition, the df, and df, are the same for F, 
and F;, . This enables comparison of the sensitivity of the two tests. The 
power of the F, test is Prob {F, > F.07/(bo{ + o%)}; and the power of F3, 
is Prob {F;, > F.(07 + o2)/(bo{ + o7 + o2)}. The smaller the value to the 
right of >, the greater the power of the test. As 07/(baj + 07) < (07 + 0%)/ 











J. P. SUTCLIFFE 15 


TABLE 5 


Analysis of Variance for Model 3a: 
Single Treatment Classification Design with 


b Randomly Sampled Subjects for Each of a Levels (Fallible Scores) 








Number Source Sum of Squares df s2 Expectation of s2 
os - 2 
1 A we +x >? (a-1) Sq bog? + o,2 + 0? 
ab Py 
2 B within A BE (Xx x, )* (b-1) s2 2 : 
ithin ij * ¥i, a(b- By Gy" 6 Sy 
ab * 
3 Total o8 4m, + x..18 (ab-1) 


1) 





(boi + 07 + o%), the power of F, is greater than the power of F3, . That is, 
analysis in accordance with Model 3a provides a less sensitive test of the 
hypothesis { > 0 than does Model 1; the loss of sensitivity is due to the 
intrusion of random error of measurement. 

Model 3a allows for the acknowledgement of the presence of error 
variance, but there is no provision for its isolation. To achieve this, one has 
to add random replication of measurement for each subject. That is, instead 
of a single score for each subject one has a number of scores. This introduces 
a source of variation in addition to those already accounted for; accordingly 
the notation and plan presented above have to be expanded. Let k = 1, 2, 

- , © represent any one measure or score in a sample of scores C. Then 
X;;, is the kth score of subject j at treatment level 7. As measures on subjects 
are randomly sampled, k represents number only, not rank. Now Model 
3b may be written as 


Xin = A, + B;; + Bee. 


A; and B;; have been defined above; and I,;, is defined as was I; . That 
is, Model 3a is the special case of Model 3b in which k = 1. The summary of 
the analysis of variance for Model 3b is given in Table 6. This analysis 
provides two tests of significance. 

For the first, the null hypothesis is 7? = 0. One rejects the null hypothesis 
if the ratio F;, = S;,/S¢, with df, = a(b — 1) and df, = ab(c — 1) exceeds 
the tabled value of F for the chosen level of significance. If the null hypo- 
thesis is not rejected, the outcome is consistent with the homogeneity of 
experimental subjects, and in that sense one has zero reliability of measure- 
ment. If the null hypothesis is rejected, an estimate of the reliability of 
measurement may be obtained. With the Case 3 universe, the population 
value of the reliability coefficient [1] is p,, = 07/(0?7 + 02), which may be 
estimated by 








16 PSYCHOMETRIKA 


TABLE 6 


Analysis of Variance for Model 3b: 
Single Treatment Classification Design with 
c¢ Random Measures on each of 
b Randomly Sampled Subjects for each of 


a Levels (Fallible Scores) 








Nuaber Source Sum of Squares df s2 Expectation of s? 
—e- - 2 
1 A be 5 (KX; - “oe (a- 1) Sa bea,” + co,? + o.* 
2 B h A : : (x x )2 (b-1) se co,2 + 6,2 
i ie has "Res aia Bq t e 
abc . 2 
3 C within B BEE (X= Xi)? ab(c-1) Sc ae 
abc - 
4 Total BEE (UX - xX? (abe- 1) 





‘2 = (Sb, are Scs)/[Sb. — cal Pe c)]. 


For the second, the null hypothesis is of = 0. One rejects the null 
hypothesis if the ratio 74, = Sj/S%, with df, = (a — 1) and df, = a(b — 1) 
exceeds the tabled value of F for the chosen level of significance. 

Comparison of the power of the F4, test 

Prob {Fi, > Fa(coi + 0%)/(beok, + co; + 0%)} 
with the powers of F, and F;, shows that as 
o __ c+; e o, +o. 
boi +a; bea; +00, +0? bo, total 


then power F, > power F{, > power F;,. 








While analysis by the Model 3b allows for isolation of an estimate 
of o? , it is important to note that one may not convert Fi, to F, by sub- 
tracting SZ, — o% from the numerator and denominator of F{, and making 
appropriate adjustments for the weights b and c. F is the ratio of two in- 
dependent x’ variates—the independence is negated by such a procedure. 
The only way to achieve the standard of sensitivity of the F, test with the 
given number of subjects is to use error-free measurement. As this is an ideal 
towards which one can do no more than strive, one has to be satisfied with a 
less sensitive test. Of the two remaining experimental designs, assuming 
that one can achieve measurement replication, that which provides the 3b 
form of analysis is to be recommended for general practice. It yields estimates 
of measurement error variance and reliability, for the latter a test of sig- 
nificance, as well as providing a more sensitive test of treatment effects than 








J. P. SUTCLIFFE 17 


the 3a design using the same number of subjects. These contentions 
apply with equal force to the design situations where the ¢ test is ordinarily 
applied. Finally, while the argument has been in terms of the single treat- 
ment classification design, it may be generalized to multiple classification 
designs. 


REFERENCES 


[1] Alexander, H. W. The estimation of reliability when several trials are available. Psycho- 
metrika, 1947, 12, 79-99. 

(2] Ehrenberg, A. S. C. The unbiased estimation of heterogeneous error variances. Bio- 
metrika, 1950, 37, 347-357. 

[3] Walker, Helen M. and Lev, J. Statistical inference. New York: Holt, 1953. 


Manuscript received 1/14/57 
Revised manuscript received 4/80/57 








PSYCHOMETRIKA—VOL, 23, NO. 1 
MARCH, 1958 


DETERMINATION OF PARAMETERS OF A FUNCTIONAL 
RELATION BY FACTOR ANALYSIS* 


LepyarD R Tucker 
PRINCETON UNIVERSITY 
AND 
EDUCATIONAL TESTING SERVICE 


Consideration is given to determination of parameters of a functional 
relation between two variables by the means of factor analysis techniques. 
If the function can be separated into a sum of products of functions of the 
individual parameters and corresponding functions of the independent 
variable, particular values of the functions of the parameters and of the 
functions of the independent variables might be found by factor analysis. 
Otherwise approximate solutions may be determined. These solutions may 
represent important results from experimental investigations. 


The possible use of factor analysis techniques to determine parameters 
of nonlinear functional relations has been a topic for occasional informal 
discussion. If a factorial approach could be developed it would have con- 
siderable application to experimental problems such as learning curves, 
work decrement curves, dark adaptation curves, etc. This note gives a 
theoretical basis for determination of parameters by factor analysis for 
many nonlinear functions. 

Factor analytic methods have been limited to investigations applying 
linear functions of the form (see [2], equation 3, p. 71): 


(1) Si = a. Qi mSmi ’ 


m=1 


where the s;; are the observations, and a;,,, and s,,; are to be estimated. The 
a; are task parameters, and the s,,; are individual parameters. 

In the present context we will consider the functional relation between 
two variables x and y. Variable x might be termed the independent variable 
and y might be termed the dependent variable. A general statement of this 
functional relation for any given individual 7 is given by 


(2) ve. = $(Poi ’ 2), 
for which there are a number of parameters p, which have specific values 
*This research was jointly supported in part by Princeton University, the Office of ’ 


Naval Research under contract N6onr-270-20, and the National Science Foundation 
under grant NSF G-642. 


19 








20 PSYCHOMETRIKA 


p,: for each individual. Such a relation is shown graphically in Fig. 1. There 
exists a family of functions of the form of any given ¢ with the values of p,; 
defining the particular member of the family. Let j be a particular point of 
this function with coordinates x; and y;; . Then 


(3) Yi = (Doi » £;). 


Many functions may be transformed so as to produce 


(4) uss = DO falt.) Pale: 


The f,,(z;) are a number of functions of the independent variable x; . The 
F,,(p,:) are corresponding functions of the parameters p,; . The number, r, 
of such functions may be finite, or it may be infinite. In this latter case, (4) 
represents an infinite series, such as Maciaurin’s or Taylor’s power series or 
Fourier’s trigonometric series (see a standard advanced calculus text, e.g., 








g. 





| 
| 
| 
| 
| 
| 





Figure 1 
A Functional Relation of the Form of (2) 


[1], [3]). Frequently, in this case, a small number of terms of the series will 
yield an adequate approximation to the y;; . In order to make (1) applicable 
it is only necessary to define 


(5) bin = fm(2;) ; 

(6) Sni = F,.(p,i)- 

Then 

(7) Vi = 2; AjmSmi + 
m=1 


In the present context the s,,; will be considered as derived parameters 
of the transformed function. While they may be expressible in terms of more 
primitive parameters, they do have the property of determining the particular 





at 


f 





LEDYARD R TUCKER 21 


function for each individual. The family of functions is defined by the a;,, . 
As a consequence of (7), observations of y;; for several given x; and individuals 
t may be entered into a score matrix. Each x; might be used to produce one 
statistical variable. Estimates of the a;,, and s,,; then can be obtained by 
factor analysis techniques. 

In order to illustrate the foregoing, consider a learning task for which 
the learning curve is a simple exponential function, such as 


(8) Yin =e 
where y;; is the performance of individual 7 on trial j, b; is a parameter for 
individual 7, and ¢; is the number of trials 7. ¢; replaces x; as the independent 
variable in this context, and b; replaces the parameters p,; . Equation (8) 
may be transformed to 


(t7+bi) 
’ 


(9) yis = (ee). 
Then 

(10) a; = fi(t;)) =e", 
(11) 8; = F,(b,) =e". 


In this case only one term of the sum of products indicated in (4) and (7) 
exists. From (9), (10), and (11) 


(12) Yit = AjSi; - 


For this simple case, observations are made of the performances on the 
learning task for each of a number of individuals at each of a selected number 
of trials. These observations yield a matrix of y;; . A factor analysis will 
involve a single factor and yield estimates of the a;, and s,; . 

The factor analysis problems of communalities and rotation of axes 
remain to be discussed. In the present context it seems appropriate to assume 
that each observed y;; may be in error, but the assumption of specific factors 
seems inappropriate. As a consequence, reliability estimates should be placed 
in the diagonals of the matrix of intercorrelations. The rotation of axes problem 
remains unsolved in the present case. The solution is not unique, and the 
axes may be rotated. It is doubtful, moreover, that the principle of simple 
structure is applicable when the factor loadings are the various values of 
the functions f,,(x;) for the selected points. Some other principle, at present 
unknown, is needed to fix the location of the axes. 

An alternative interpretation of (7) corresponds to the obverse factor 
procedures, where people are correlated over a population of measures, A 
large number of values of 2; are selected, and the y;; are observed for a group 
of individuals. Each of these individuals can be considered as a variable and 
correlations of the y;; can be obtained for pairs of individuals. The s,,; are 








22 PSYCHOMETRIKA 


now the factor loadings, and the a,,, are the factor scores. The communalities 
and rotation of axes aspects of the analysis are quite similar to the corre- 
sponding aspects of the first procedure already discussed. One important 
difference between the present analysis by persons and the previous alterna- 
tive stems from the more direct determination of the s,,; . An inspection of 
the matrix of s,,; might reveal a curvilinear relation between the s,,; for 
several m. Any such relation as the entries in one row being proportional to 
the square of the entries in another row would indicate a relation to a common, 
more primitive parameter. The entries in one row being proportional to the 
product of corresponding entries in two other rows would also be indicative 
of more primitive parameters. Rotation of axes might be performed so as to 
reveal such relations. 

In any particular situation, the choice as to which variable is to be the 
independent variable x and which variable is to be the dependent variable y 
may be quite important. In a learning experiment for a list of paired associates, 
each trial might be an x; , and the proportion of correct responses be the 
observed y;; . However, selected proportions of correct responses might be 
taken as the x; , and the numbers of trials necessary to reach these proportions 
taken as the y;; . Consider a slightly more complex exponential learning 
curve than that given in (8), such that 
(13) P me oo. 
where P is the measure of performance. The parameter c; has been included 
as a multiplier to ¢. This function does not separate in the manner that (8) 
did unless an infinite series is used. In which case, if values of ¢; are chosen 
and values of P;; are observed, the factor analysis will not involve a definite 
number of factors. Each successive factor will permit a closer approximation 
of the series to the function. Some finite number of factors might be found to 
be adequate. 

If logarithms are taken of both sides of (13), it is possible to solve for ¢ 
as a function of P: 


1 b; 
(14) ~~ v . 


When values of P are selected as P; and the corresponding t;; are observed, 
then 


1 b; 
(15) ti; = rs log P; + . 
Define 
(16) aj, = logP; , 
(17) 8; = 1/c; , 
(18) aj. = 1, 


(19) = b,/c; . 


a” 
i 
~ 

| 








LEDYARD R TUCKER 23 


Then 
(20) tis = AjSig + Aj282; , 


which is in the form of (7). Only two factors are involved. 
Another extension from (8) is to introduce an additive constant d,: 


(21) P=d,+e°™, 


Individual parameters and the variable ¢ may be separated for (21) in the 
same manner as given for (8). There are now two factors. 

If both of the foregoing extensions of (8) are incorporated into a single 
extension, then 


(22) Pad eer, 


The individual parameters do not readily separate now from either variable 
without employing an infinite series. 

It is to be noted that (8) might be treated in the same manner as was 
(13). The individual parameters might be separated from the variable y or P 
rather than from ¢ as given. Thus, the foregoing examples include (z) a func- 
tion, equation (8), that may be treated either way; (72) two functions, (13) 
and (21), each of which may be treated in only one manner; and (772) a func- 
tion, (22), that cannot be separated. The two single treatment functions form 
a contrast as to which variable, P or ¢, is taken as the independent variable. 
In (13), P should be taken as the independent variable while in (21) ¢ should 
be taken as the independent variable. In any particular experimental case, 
the decision as to which variable is to be treated as the independent variable 
must rest on experience and the judgment of the experimenter. There are 
cases where the number of factors is excessive whichever variable is taken as 
the independent variable. The factorial approach may yield in some of these 
cases an adequate approximation to the observations with a limited number of 
factors. 


REFERENCES 


[1] Osgood, W. F. Advanced calculus. New York: Macmillan, 1925. 
[2] Thurstone, L. L. Multiple-factor analysis. Chicago: Univ. Chicago Press, 1947. 
[3] Wilson, E. B. Advanced calculus. Boston: Ginn, 1911. 


Manuscript received 8/15/55 








PSYCHOMETRIKA—VOL, 23, No. 1 
MARCH, 1958 


THE INCLUSION OF RESPONSE TIMES WITHIN A STOCHASTIC 
DESCRIPTION OF THE LEARNING BEHAVIOR OF 
INDIVIDUAL SUBJECTS 


R. J. AUDLEY 


UNIVERSITY COLLEGE, LONDON 


A stochastic process applicable to the learning behavior of an individual 
subject is discussed. The process describes both the response times and the 
sequence of choices obtained from a situation involving two alternatives. 
a estimates and techniques for assessing goodness of fit are con- 
sidered. 


In a previous paper [2], the possibility of providing a probabilistic 
description of the learning behavior of an individual subject was discussed. 
A family of stochastic processes suitable for this purpose was introduced, 
and problems of parameter estimation and goodness of fit were examined. 
This examination was restricted to the description of the sequence of responses 
made by a subject in an experimental situation involving a choice between two 
alternatives, e.g., the learning of a position habit in a single-unit T-maze. 
Usually, however, an investigator observes not only the choice made at each 
trial but also the time taken to make the choice, which for brevity will be 
referred to here as the response time. The present paper is an attempt to 
include the response times within the stochastic description elaborated in 
the earlier paper. 

This inclusion of response times carries with it several advantages. The 
estimation of parameter values can now be based upon a continuous time 
variable as well as the two-valued variable, success or failure, which was the 
only datum previously employed. Furthermore, there are certain sequences 
of responses, such as a long unbroken series of successes or failures, which 
make it impossible to provide parameter estimates unless response times 
can be used for this purpose. 


The Stochastic Processes 


Originally, the processes were based on an urn scheme. Here, however, 
they will be developed from some simple assumptions, which can be regarded 
as an identification of the elements of the urn scheme. To give a brief re- 
capitulation of the scheme of the earlier paper: consider an urn containing 
red and black balls, drawing a red ball being considered equivalent to the 
occurrence of a correct response, and a black ball to an incorrect response. 
The number of balls of the two colors is changed after a ball is drawn, accord- 


25 








26 PSYCHOMETRIKA 


ing to certain rules. In the present paper, the number of balls of a particular 
color is identified with a hypothetical mean rate of making the response 
associated with this color. 

For the purpose of simple exposition, attention again will be restricted 
to data obtained from learning situations involving only two alternative 
responses, with one response consistently rewarded. At the tth trial, it is 
assumed that the probability of a correct response occurring in a small time 
interval (T, 7 + AT) is r,AT, and of an incorrect response in the same 
time interval is w,AT. r, and w, may be regarded as hypothetical mean 
rates of responding, i.e., the distribution of response times for either response, 
taken individually, is exponential. This assumption was considered for situ- 
ations with only one available response by Mueller [10]. Christie [6] has also 
considered the two-choice situation as one involving the competition between 
two responses emitted at independent random rates. His paper should be 
consulted for a more detailed statement of the events supposed to take place 
at any particular experimental trial. 

The probability of no response occurring in time 7’ will be 


(1) PAT) = sw? 


(e.g., see Feller [7], p. 366). 

In the learning situation being considered, the first response to occur 
terminates an experimental trial. Hence, the probability of a correct response 
occurring at any trial is the probability that this response is the first to occur. 
The probability that a correct response terminates the tth trial at time T 
is from (1) and the basic assumptions equal to 


—(rptwe)T 
ers we) r, AT, 


and therefore the probability of a success at the ¢th trial, is 


P(t = [ ares me, : dT = un 
(2) @ . Bs rT, + W, 


70 


It is further assumed that the hypothetical response rates, r, and w, , are 
linear functions of the number of correct and incorrect responses in the first 
t — 1 trials. Thus it is assumed 
rT; +f ka + (¢ al 1 a: k,)b, 

WwW, = Ww, + kc + (t —- 1 an kd, 


Il 


lr; 


(3) 


where r; and w;, are the initial rates of making correct and incorrect responses, 
respectively, k, is the number of correct rewarded responses in the. first 
(t — 1) trials, and a, 6, c, and d are parameters associ ’ >»d with the influence 
of punishment and reward upon the hypothetical re yonse rates. 








R. J. AUDLEY 27 
Substituting for r, and w, in (2), the probability of a correct response on 
the ith trial, given k, previous successes, is 


mr + ka — b) + (t — 1b 
n+wtki(a+e—-b+d) + (t— Id + d) 


Dividing numerator and denominator by (7, + w,), and putting 








(4) P(t|k,) = 














pe eee ae nr 
,+%, ©" fem “ neo © 
ate _ b+d 
n+w, - i+w 
gives 


1+ kily: — v2) + (t — Ivo 


Equation (5) is the fundamental expression of the earlier paper [2]. 

The distribution of response times at the tth trial is also completely 
specified and is exponential. In particular, the mean response time, L, , is 
given by 


1 
r, + UW, 





(6) L, = Coie: + w,)T dT = 


The relation between response times and probabilities here is based upon 
the very simplest of assumptions. Clearly, the assumptions concerning the 
hypothetical response rates and the relation between these rates and past 
experience can be readily modified. Also, in practice, it is unlikely that the 
general process, having six parameters, r, , w, , a, b, c, and d, would be used. 
Special cases, with some of the parameters a, b, c, and d eliminated, or given 
particular values, would be more commonly employed. An application of 
such a special case to experimental data has been given elsewhere [1]. 

The relation between these stochastic processes and those suggested by 
other investigators, in particular by Bush and Mosteller [4, 5] and Gulliksen 
[8, 9], has been fully discussed in the previous paper [2]. However, one further 
comparison is suggested by the present inquiry. Assumption (3), giving the 
hypothetical response rates as linear functions of the previous number of 
correct responses, can be shown to be equivalent to a system of linear operators 
and is similar to the treatment of a situation with only one available response 
given by Bush and Mosteller [3]. Thus expression (5) can be included within 
a linear operator system if the operators are assumed to act not upon the 
probability of a response but upon response rates hypothetically underlying 
this probability. 








28 PSYCHOMETRIKA 


Estimation of Parameters 


For brevity of exposition, consider the estimation of parameters for the 
special case arising when b = c = 0 in (3), or equivalently a = y, , 8 = 0 
in (5). Thus, it is assumed that the effects of reward and punishment of a 
response are confined to the response rate associated with this particular 
response and do not generalize to the other. This is the stochastic equivalent 
of the equation of the learning curve developed by Gulliksen [8, 9]. 

Consider the data obtained from a simple situation involving a choice 
between two alternatives. We observe the sequence of choices made by a 
subject as well as the response time for each trial. The response occurring 
on the ¢th trial can be symbolized by a characteristic random variable, X, . 
If a correct response occurs, X, = 1; if an incorrect response occurs, X, = 0. 
Similarly let 7, be the response time at the tth trial. It should be borne in 
mind, however, that the distribution of possible response times at each trial 
is taken to be exponential and hence response times close to zero are con- 
sidered likely. Therefore 7, should more properly be the difference between 
the response time observed and the minimum response time found in the 
experimental situation. 

Suppose, then, that we have the results of n learning trials of an individual 
subject, X, and T, (¢ = 1, 2, --- , m). At the ¢th trial, the probability of a 


correct response at time 7’, is 
en ort wa aT 
and the probability of an incorrect response at the same time is 
—(ret+we)T 7 
gre AT. 
Hence the likelihood, L,, , of the entire sequence of responses and response 
times is 
n 
_ —(ret+we)T.. X aX 
(7) ‘ = I] le (rt wi ‘we = 
t=1 
We now seek those values of the parameters r, , w, , a, and d which 
maximize L,, . It is more convenient to maximize 
(8) , = log L, = > [-(, + wT, + X, logr, + (1 — X,) log w,]. 
t=] 


Remembering that b = c = 0 is assumed, substitute for r, and w, from (3), 
so that 


1, = _ [-—(7, + w, + ka + f,d)T, 
+ X, log (r,; + k,a) + (1 — X,) log (wu, + f,d)], 


(9) 














R. J. AUDLEY 29 


where f, = ¢ — 1 — k, . Differentiating \, with respect to r, , w, , a, and d, 
and setting the differentials equal to zero, 











(10) Pe -Ene Ss: 
(11) Me - Orb + DA = 0; 
(12) Et Soe 
(13) = - D+ CS Ho. 


r; and w, can readily be eliminated. For example, consider (10) and (11). 
Equation (11) may be rewritten as 


(14) Dar = Dx -—-)=1(5x,- no), 


It should be noted that k, = 0 on the occasion of the first correct response, 
and hence the summation in (11) extends over one less trial than that in 
(10). Thus by appropriate substitution from (10), 


rer. =4[h-1-2(D7,-2)] =@-n DT, 


where k is the total number of correct responses in the entire n learning 
trials. Hence 








k— (a kT.) 
7, 





(15) +3 > 


Similarly, it may be shown that 


n-k—-(d DET) 
(16) bi * T, 


Substituting these values for r, and w, in (11) and (13) two equations are 
obtained, each in one unknown, namely 


aie kx, 7 er Y: 
Fe) = 2 {ie a DTT) + a} ~ ghitr= 








(17) 





_ fl — 2,) \ ms 
FO) = 2 tm SEES PIE T+ af ~ DAT 


These equations may appear formidable, but they are not difficult to set up 
and can readily be solved by a numerical iterative procedure. Generally 
a Taylor series expansion has been employed (e.g., see Whittaker and 











30 PSYCHOMETRIKA 


Robinson [11]). Having found the appropriate estimates of a and d, (15) 
and (16) give estimates of r, and w, . If the alternative parameters for the 
description of the choice sequence alone, p, a, and y, , are required, the 
appropriate substitutions are given by (4) and (5). 


Goodness of Fit 


The stochastic processes described above are nonstationary, and hence 
no definitive answer to the problems of testing goodness of fit can be given. 
There are two kinds of data with which the theoretical description can be 
compared; each comparison presents rather different problems. 

Some consideration of the sequence of choices made by a subject has 
already been given [2]. It was suggested there that the most appropriate 
procedure would be to determine the distribution of likelihoods of all the 
possible sequences of length n, given the estimated parameters, and then to 
compare the likelihood of the observed sequence with this distribution. 
Unfortunately, as yet, we have been able to determine this distribution 
only for the simplest case arising from (5), when a = 6, y; = y2 = 0. Lacking 
any proper statistical procedure, it would apparently be best to compare 
visually the observed curve of cumulative successes against trial number 
with a theoretical curve based upon the computed conditional probabilities 
of success at each trial. Although this is not very satisfactory, it should give 
some indication of any gross discrepancies between the theoretical descrip- 
tion and the experimental data. 

In the case of the response times, some idea of the goodness of fit of the 
stochastic process can be given in the following way. (I am indebted to Dr. 
D. E. Barton of the Statistics Department, University College, London, 
for this suggestion.) Having estimated the parameters, the theoretical mean 
response time at each trial, L, , is given by (6). Since the response times 
are assumed to be distributed exponentially at each trial, the ratio of the 
observed response times, 7’, , to the theoretical mean time L, , (i.e., R, = 
T,/L,) should be itself distributed exponentially. Hence exp (— R,) should 
have a rectangular distribution in the region (0, 1). Thus the over-all theo- 
retical distribution of response times can be tested against the observed 
data. Further, a plot of the transformations R, against the trial number 
should reveal any marked trends away from the stochastic description. 


Conclusion 


It is apparent that answers to the problems of goodness of fit are not 
very satisfactory. In spite of this, it is suggested that the general approach 
presented here has some value for the description of experimental data. The 
procedures given should be sufficient for the comparison of learning behavior 
occurring under different experimental conditions. Furthermore, the kinds of 
assumptions underlying the stochastic description make it possible to intro- 











R. J. AUDLEY 31 


duce assumptions concerning the influence of other variables upon learning 
behavior. In particular, a consideration of the relation between the hypo- 
thetical response rates and conditions of motivation might be of some interest. 

The basic assumptions may also be modified easily, without changing 
the general form of the mathematical development. Other theoretical de- 
scriptions of learning behavior, therefore, might be readily put into the form 
suggested by the present paper so that their formulation and verification 


could be carried out with greater precision. 


REFERENCES 


{1] Audley, R. J. A stochastic description of the learning behaviour of an individual 
subject. Quart. J. exp. Psychol., 1957, 9, 12-20. 

{2] Audley, R. J. and Jonckheere, A. R. Stochastic processes for learning. Brit. J. statist. 
Psychol., 1956, 9, 87-94. 

{3] Bush, R. R. and Mosteller, F. A mathematical model for simple learning. Psychol. 
Rev., 1951, 58, 313-323. 

{4] Bush, R. R. and Mosteller, F. A stochastic model with applications to learning. Ann. 
math. Statist., 1953, 24, 559-585. 

{5] Bush, R. R. and Mosteller, F. Stochastic models for learning. New York: Wiley, 1955. 

{6] Christie, L. S. The measurement of discriminative behavior. Psychol. Rev., 1952, 59, 


443-452, 
{7] Feller, W. An introduction to probability theory and its applications. New York: Wiley, 


1950. 
{8] Gulliksen, H. A rational equation of the learning curve based on Thorndike’s laws 
of effect. J. gen. Psychol., 1934, 11, 395-434. 
[9] Gulliksen, H. A generalization of Thurstone’s learning function. Psychometrika, 1953, 
16, 297-307. 
{10] Mueller, C. G. Theoretical relationships among some measures of conditioning. Proc. 


nat. Acad. Sci., 1950, 36, 123-130. 
{11] Whittaker, E. T. and Robinson, G. The calculus of observations. London: Blackie, 1924. 


Manuscript received 11/23/56 
Revised manuscript received 4/22/57 


























PSYCHOMETRIKA—VOL. 23, NO. 1 
MARCH, 1958 


DETERMINING THE DEGREE OF INCONSISTENCY IN A SET OF 
PAIRED COMPARISONS* 


Harowp B. GERARD 
BELL TELEPHONE LABORATORIES 


AND 


Harowp N. SHAPIRO 
NEW YORK UNIVERSITY 


Consistency in paired comparison data is defined. Two types of in- 
consistency which may arise are defined. Computational formulas for these 
types of inconsistency are derived, and examples illustrating the use of these 
formulas are presented. 


In a recent experiment [1], the authors were concerned with obtaining 
a measure of S’s psychological certainty concerning the probable success 
of some future undertaking. After exposure to the experimental manipu- 
lations, E presented S with seven 5 X 8 index cards with a different odds 
for success printed on each card. The stimuli presented were: 10 to 1, 5 to 
1, 2 to 1, 1 to 1, 1 to 2, 1 to 5, 1 to 10. All possible pairs of stimuli were pre- 
sented, and S was asked to select the member of each pair which better 
reflected what he thought his chances were. 

From this set of data a measure of both subjective probability of success 
and S’s degree of certainty regarding his estimate was desired. This problem 
is typical of many in which stimulus comparisons are made. What is presented 
in this paper is a method for analyzing the consistency of response in such 
experiments. The method, which involves matrix arithmetic, is quite difficult 
to formulate in all generality; presented here is a complete analysis of a 
special case. 


The Approach 


Let the stimulus cards appear as points P, , P, , ---, P, ona line which 
represents a continuum of subjective probability. Let X represent the position 
of the individual on the line, i.e., his actual subjective probability: 


a4 
(1) oe P, pers 








*These ideas were developed while the first author was on the staff of the Research 
Center for Human Relations. The work was made possible by the ONR contract NONR 
285(10). The authors are indebted to Jack Moshman for his helpful critical suggestions. 
The United States Government is authorized to reprint this article in whole or in part. 


33 








34 PSYCHOMETRIKA 





Consider the points, P; and P,; , and the question, to which of P; and P; is 
X nearer? If P; is nearer to X than P; , write a;; = + 1. If P; is nearer to X 
than P; write a;; = — 1. Define a,;; = 0. For all ¢ and j, a;; = — a;;. 

All of the paired comparisons of the set of points may be tabulated in 
a square n X n matrix A = (a;,;). This matrix has 0 in the principal diagonal. 
If the row element is closer to X than the column element, the entry is + 1; 
contrariwise the entry is — 1. Since a,;; = — a,;; the matrix A is 
skew symmetric. 















The Development 


Definition. An answer matrix is a skew symmetric n X n matrix where 
entries off the main diagonal are all + 1 or — 1. 

Definition. The set of responses, or the answer matrix A, is called in- 
consistent if there exists no possible determination of distances between 
the P; , and no possible placement of X in (1) for which A is the answer 
matrix. If some, not necessarily unique, determination of these distances 
and placement of X is possible then A is called consistent. 

Definition. For each 7, 1 < 7 < n, and a given answer matrix A, define 
\; = A,(A) as the smallest subscript \; > ¢ such that a;,, = + 1. If \; does 
not exist properly define \; = ~. 

Definition. Define p = p(A) as the position index of an answer matrix 
A as p = min X; . Note that it is possible that p = ©,i.e.,’, = ©, for all 
Z1<st<n. 



























THEOREM. The necessary and sufficient conditions for an answer matrix 
A to be consistent are that p(A) = @, or that p(A) < @ and there exists a k, 
1<k <n, such that 


() p= p(A)=r%y =k +1, 


(it) ASM See? SA, 


iin > ie for k<i<n 
1 


o for 1=n, 
(~) a; = +1 for j7>i4,;. 


These conditions assert that in order to be consistent the skew symmetric 
matrix A has two connected regions of entries above the principal diagonal, 
one of + 1’s and the other of — 1’s as pictured in Fig. 1. The boundary or 
demarcation line between the regions appears as “steps” going up and to 
the right. The case where p = = is the degenerate case wherein there are 
no + 1’s above the diagonal. 




















HAROLD B. GERARD AND HAROLD N. SHAPIRO 35 





i™ Row -—-—--------X- So 


AL = U + 187 COLUMN 











Ficure 1 
Pictorial Representation of the Conditions for Consistency of Matrix A 


An examination of the separation diagram (Fig. 1) is in practice the quick- 
est way of deciding whether or not the response matrix is consistent. 


Proor. From the definition of p it follows that, when p = ©, all the 
entries above the principal diagonal are — 1, and hence those below this 
diagonal are all + 1. But this is precisely the answer matrix which corre- 
sponds to placing the point X to the right of the last point P,, in (1) or closer 
to P, than to P,_, . In the following, then, we may restrict ourselves to the 
case p < ©, i.e., to those cases where X is closer to P,_, than to P, . 

Necessity. Select a k such that X lies between P, and P,,, . We may 
assume that X is closer to P, than to P,,, . (If not, X could be placed between 
P,-, and P,,, and closer to P,,, without changing any of the answers which 
determine the entries in the matrix A. This would replace the role of k by 
k + 1, and X would be nearer P, than P,,, .) 





(2) a ee 
In (2), a;; = — 1 fort < j < k, which implies that \; > k forz = 1, --- 
: — 1. Since a4: = +1, = & + 1. Also fori > k,7 > itis cleat that 
= + 1,sothatdA; =7 ne 1. Thus p = x, =k+1 ati conditions (7) and 

(i are established. 

In addition to knowing that X is between P, and P,,, and closer to 
P, , how much additional information is necessary to determine completely 
the answer matrix A? Suppose, for each 7, 1 < 7 < k, it is known which is 
the first P; , 7 > 7, such that X is closer to P; than to P, . If P,, is this point 
it is clear that 


a; = —1 for j<4,, 
and 
a; = +1 for j24u:,; 


and the matrix is completely determined. Clearly also \; = »; forl <i < k. 











36 PSYCHOMETRIKA 


Since it is immediate from the definition of P,, that 
ae Swe Sees Sw, 


therefore (iz) follows. From what has been given above (zv) is also immediate. 
This completes the proof of necessity. 

Sufficiency. We wish to show that if an answer matrix A satisfies the 
conditions (2), (7), (#7), and (zv), we can find a configuration of the dis- 
tances between the P; and a position for X in (1) which realizes it. Again 
assume that p(A) < o. Let p(A) = k. Place P, and P,,, on a line with 
X between them and closer to P, , as in (2). Consider \,_, > 4, = k + 1. 
If \,_, = k + 1, place P,_, close to P, such that 


PyrX < XPiss 


(where PQ denotes the length of the line segment PQ). If 4. > k + 1, 
place P,,_, to the right of P,,, and P,_, to the left of P, such that the follow- 
ing inequalities are satisfied: 





Px > SP: , 
AF ki, 2 Tae 


Next consider \y-. > Ay; . If Ay-2 = Ay-1 , place P,_, to the left of, and 
close to, P,_, such that 








Pe eA ig 
If A,» > Ax-1 , place P,_. to the left of P,_, and P,,_, to the right of P,_, 


so that 





Px + 
If this process stops at a \; = ©, choose P; far enough to the left so that 
PX > XP,.., , 


and place the remaining P; , 7 < 7, to the left of P; , and the remaining P, , 
h > rjz41 , close to and to the right of P,,;,, such that the last point P, , 





satisfies 

P,X > XP, . 
The resulting configuration clearly has A as its answer matrix. This completes 
the proof of the sufficiency. 


Fundamental types of inconsistency 

An answer matrix A may be inconsistent for a variety of reasons. Con- 
sider two simple reasons which we designate as fundamental. 

Intransitivity. Suppose we have a triplet of subscripts 7, j, k such that 
a;, = + 1,4, = + 1, a,;, = — 1. Then the answer matrix is inconsistent, 








Aw 





HAROLD B. GERARD AND HAROLD N. SHAPIRO 37 


for if the matrix A is realized by the set of points, P; , and a position for X, 
we would have P; , P; , P, as three distinct points with 


|P;-X|<|P,;-—X| (from a;; 
|P; -X|<|P,—X| (from a; 


+1), 
+1), 


and 
|P,-X|<|P;-—X| (from a;, = —1). 


But the first two inequalities imply | P; — X | < | P, — X | in contradiction 
to the third. Thus the matrix A cannot be realized. An inconsistency mani- 
fested in this way is called an intransitivity. 

From the skew symmetry of an answer matrix, the triplet described 
above, implies also that 


a; = +1, a3; 


Il 


+1, a; = —1, 
and 
ix = +1, ak = +1, a;; = —1. 


That is, there are apparently three intransitivities generated. In the follow- 
ing we shall count these as one intransitivity involving the triplet of subscripts 
i, j, k. 

Separation. Suppose we have a triplet of subscripts i < 7 < k such that 
a;; = +1 anda; = — 1; then the answer matrix A is inconsistent. From 
a;; = +1, X must be to the left of P; and from a;, = — 1, X must be to 
the right of P; , which is impossible. An inconsistency manifested by a 
triplet 7, 7, k such that t < j < kanda;; = + 1, a; = — 1, iscalleda 
separation. 

It is important to note that the cause of an intransitivity is independent 
of the ordering property of the points, whereas separations are intimately 
connected with some assumed a priori order requirement. It is not true that 
intransitivities and separation errors are the only possible errors one can 
characterize in a set of paired comparisons. It is, however, true that in- 
consistency as herein defined will result in at least one intransitivity and/or 
separation error. The concepts of separation and intransitivity are independent 
in the sense that there exist answer matrices possessing one without the 
other. 


Characterization of consistency 


THEOREM. An answer matrix is consistent if and only if tt contains no 
intransitivities or separations. 


Proor. The “only if” part of the theorem is quite trivial since the 
presence of an intransitivity or separation renders an answer matrix in- 











38 PSYCHOMETRIKA 


consistent. On the other hand, assume the answer matrix has no intran- 
sitivities or separations. We will prove that it satisfies the conditions (2), 
(tz), (itz), (iv). Clearly we may assume that p(A) < ©, since if p(A) = ~ 
the matrix is consistent. 

Let the rth column be the first column such that a + 1 appears above 
the main diagonal; r > 1, and it exists since we assume that p(A) < ~. We 
will first prove that above the main diagonal all — 1’s in the kth column 
are above all + 1’s. For if otherwise, 


ax. = +1, ay = -1, +<j<k, 


or 
ay, = +1, Qj = +1. 

Since there are no intransitivities this implies a;; = + 1. But thena,;; = + 1, 

a;, = — 1,4 <j < kisa separation, which is impossible. 


The above argument demonstrates that above the main diagonal — 1’s 
and + 1’s group themselves as required. Now in the rth column (the first 
column with a + 1 above the main diagonal) we must have a,_,,, = + 1, so 
that \,_, = r. Thus there remains to prove only that 


(3) r=-1<A2 Sf °°? SM, 


since it then follows that p(A) = r — 1 andA; = 1 + 1 for? > r. Suppose 
that (3) is false, i.e., forsome k,r —-2>k> 1, < Ay41.Thenay, = +1, 
Qisi.n. = — lsinced, >r >k+2>k+1. But this contradicts the fact 
that above the main diagonal — 1’s are above + 1’s in each column. We must 
also verify condition (iv) of the consistency theorem, i.e., that for j > A, , 


a;; = + 1. Suppose that this is false, i.e., for some j > A; , a;; = — 1. Since 
an, = + 1,j>A;, then a;; = + 1, and since there are no intransitivities 
a;,, = + 1. It follows that a,,; = — 1 anda,, = + 1, which isa separation. 


Thus the conditions of the consistency theorem are satisfied and A is con- 
sistent. 

Since the notions of intransitivity and separation lie at the basis of the 
degree to which an answer matrix can be said to be inconsistent, we next 
consider the question of determining the number of intransitivities and 
separations. 


Number of intransitivities 
Let T = the number of intransitivities in an answer matrix A; R, = the 
sum of entries in the kth row of A. 


THEOREM. 


eS irae = . 2 
T = oA | nin 1)-3 > wi. 





n= 
F 
co 


ye 
le 
in 











HAROLD B. GERARD AND HAROLD N. SHAPIRO 39 


Proor. For convenience, introduce C; = >."_, a;; = sum of the entries 
in the 7th column of A. Since a;; = — a;; , 


(4) C; = —R; . 


We first consider for a given pair 7, 7 (¢ * j) the number of k such that k ¥ 7, 
j, and 


(5) @=+1, a =+1, a; = —-1, 
or 
a; = +1, a; = +1, a;,= —1. 
Let NV“; = no. of k, 1 < k < n such that a;, = +1,a,; = +1; 


N&:? = no. of k, 1 < k < nsuch that a;, = — 1,a; = — 1; 
NS? = no. of k, 1 < k < nsuch that a;, = + 1,a,; = —1; 
N“;? = no. of k, 1 < k < n such that a; = — 1, a; = +1. 


The superscript (z, j) is omitted in what follows. 

Denote by Z the n X n matrix with zeros on the main diagonal and all 
other entries + 1. For any matrix U, [U];,; denotes the entry in the 7th row 
and jth column of U. Then, for 7 ¥ j, 


@) n—-2=Ni.+N--+N,-4+ N-., 
(i) [AZ]; = Nas -N-- + Ni--—N-., 
(tit) (ZA],,; = Nag — N-- —-Ni- + N-., 
and 
(iv) [A7].; = Nas +N---Ni--— NN. 


Observe in (iz) that for N} = number of k such that a;, = 1, and N; = 
number of k such that a;, = — 1, we have [AZ];,, = N} — Nj , and NY 
Ni + Ni-,Nz = N_. + N__. Note in addition that 


[AZ];.; = z. an — 4; = R; — aj; , 
k=1 

(6) [ZA];,; = > a; — a; = C; — ai; , 
k=1 


[A’);.; = b i Ayx~Qy; - 
k=1 
Adding (iz) and (zi), then (¢) and (iv), one obtains respectively 
@) 3([AZ].; + [ZA]:.;) = Nes — N--, 











40 PSYCHOMETRIKA 


and 
(v1) 3(n —2 + [A’];,;) =N,,+N__. 

Now if NV... ¥ 0 there exists a k such that a;, = + 1, a; = + 1, so that in 
order to have consistency a;; would have to be + 1. On the other hand, if 
N__ + 0 there exists a k such thata;,, = — 1,a,; = — lora,; = +1, 
a;, = + 1, and consistency would require that a;; = + 1 ora;; = — 1. 
Therefore if a;; = + 1, the number of intransitivities involving 7, j as in 
(5) equals N__ ; if a;; = — 1, the number of intransitivities involving 7, 7 
as in (5) equals N,, . Thus, in any event, the number of intransitivities 
involving 7, 7 as in (5) equals 


$[((l + a;,)N-- + (1 — a4)N.4] = 3[(N-- + Na+) — ai(Nis — N--)]. 
Using (v), (vz), this in turn equals 
t{(m — 2) + [A*],,; — a:([AZ]..; + [ZA];.,)}. 
From (6) this may be rewritten 


Big = 3[(n — 2) + > Gi40,; — a;(R; + C; — 2a,;)). 


Also, 
rf 
T= % Hii - 
i*7 

The factor 1/6 arises since in y;,;, 7 and 7 have symmetrical roles so that, 
in the sum over all unequal 7, 7, each comparison is counted twice. Also an 
extra factor of 1/3 is introduced since we do not count as distinct a “per- 
mutation” of an intransitivity. Since for 7 ~ j, aj; = 1 we may rewrite 


wes = 4[nt+ > QixQe; — a; (R; + C;)]. 


Now, 
de Laat; = 2 Liana + DD aie 
= L(Laa(X an) + nln — 1) 
- p> CLR, + n(n — 1) 
= -D Ri + nln — 1). 
Also, 


>: a;(R; + C,) = Zz a, (R; + C;) 


t¥i t,3 


> R, Dai; + DC; » 4; 


i 


v (Re + 03) =2 ER. 
i k 











HAROLD B. GERARD AND HAROLD N. SHAPIRO Al 


Finally, since >),,; » = n? (n — 1) 
J?) 7 73 


T=% Domi. 


i#i 


3 (n(n — 1) + n(n — 1) - dR — 2 2d Ri) 


3 ne? — 1) — 3 DR, 


which establishes the theorem. The formula easily may be transformed 
into a result given by Kendall ((2], p. 156) for what he calls circular triads. 


Number of separations 
By a method similar to that used above, a formula may be obtained for 
the number of separations in an answer matrix A. Let A = matrix resulting 
from A by making all entries below the main diagonal equal to zero, and 
write 
A = (4;;), 


so that d;; = Oforz > jand 4,;; = a;; forz < j. Also let 





C, = sum of the entries of the kth column of A; 
R, = sum of the entries of the kth row of A; 
S = S(A) = number of separations in A. 
THEOREM. 
1 —l1 —2 a 
§ = xia Din yy Em -) - D&- vk = A, | 


Proor. Let Z = (2,;) be the n X n matrix with + 1 above the main 
diagonal and zero elsewhere. We propose to count for a fixed pair, 7 < j, 


the number of k, 7 < k < jsuch thata;, = + 1 anda; = — 1. Forj > 7, 
[AZ].;.; = zs Gin2e;j 
k 
(7) a be dix 
i>k>i 


= NS”? . NP? +- jae Le | la 
where V“;” = no. of k, i < k < j such that a,, = + 1, a; = +1 and the 
others are defined analogously. Where no confusion is possible the super- 
script (z, 7) is omitted. 


(8) (74],., = Dd 2adk; = a, a, 


= N.. « i N,- +f... : 











42 PSYCHOMETRIKA 


(9) j—1-i=N,,+N_+N,.+ 8... 
(10) [A]. = De aie; — N.. + N_. - N,- — N_, . 


To solve for NSi:”, 
4{(AZ].., -— (ZA];.;3} = N.- - N, 
t(j -—i—1—[A*).,;} =N.-+N-., 
so that 
(11) N,- = 3{[4Z]..; — (24), +§-t- 1 — [A%1.,5}, 
and the total number of separations 


(12) S=> >"... 


Note that 


zt > ain) = : oe 2 bei Qin 


i>i i>k>i t,i,k 


(13) i,k i<k 


II 
“(J 
SY 
> 
_-~ 
3 
= 
; 


>, Zs a;) = me 22 An; 


i>t i>k>i t,i,k 


A” A A 
 ¥ Gy; * Rikeii 
hai 


(14) ‘ 
= Dy du(k — 1) = dk - 1) Dy ari 
= Lik-)R. 
2 G-i-Y)= DG-i- Vas 
= EG-9 Eu- EEA 
= a te Lin — i) 


i=1 


2n* + 3n? +n n° +3n" 
3 2 






























HAROLD B. GERARD AND HAROLD N. SHAPIRO 


_ an — in — 2). 
6 


D bud; = > Gin > Ay; 
(16) =: » dik, 
= > RC, . 
Combining the ten equations, (7) through (16), yields 


as : [ne= Dis 2) ie ds Eun —-k—- Xk fee >A | 








This result completes the proof. 


Consideration of the following three examples will clarify the application 
of the formulas. In all three examples, n = 7. 


Example I 
B® ® 
0 —-1 -—1 +1 +1 +1 +421 2 2 
+1 0 +1 +1 +1 #+1 «+421 6 5 
+1 -1 O +1 +1 +1 +41 4 4 
Az=j];-1 -1 -1l 0 +1 +1 +41 0 3 
-1 -1 -1 -1l 0 +1 +1] -—2 2 
-1 -1 -1 -1 -1l 0 +1] -4 1 
i—-l -—-1 -1 -1 -1 -1 0}; -6 OQ 








A 


Ci 0 -1 0 3 4 5 6 
$[n(n — 1)(n — 2)] = 35; 


> 4a —-hH = -5+94+845=17; 

k 

~& — DR, =54+84+94+845 = 35; 
#0, =0-54+94845=17; 


S = 335 + 17 — 35 — 17) = 0; 
DS R=44+364+ 164+04+4+4 16+ 36 = 112; 


1 
= 5, (7(48) — 3(112)] = 0. 











44 PSYCHOMETRIKA 


In this matrix there are no separations or intransitivities. The matrix A is 
consistent. A clear demarcation line exists above the diagonal separating 
the + 1 and — 1 entries. This boundary appears as steps going up and to 
the right. The matrix is realized by 


X 
‘P, P, P; P, P; Py P; 





This realization is certainly not unique. There are many possible realizations 
which meet the criteria, namely the set of inequalities which must be satisfied. 


Example II 


This exemplifies an answer matrix with separations and no intransi- 
tivities. Form the matrix A, by changing A of Example I so that as; = —1 
and a-, = +1. The sums then are 


. ¢§ 6 2 DB ee —S. ok 
. 2 + & 4 Bos 0 
cS 2 ® 8 4 5 4 


Clearly T = 0. Now compute S. 
+ Cin —-b# = -5+04+94+845= 17; 
> RAK -— 1) =54+84+94+8-—5 
k 


I 
bo 
ou 


DAC. =0-54+04+94+8-5= 


| 
i] 

| 
~J 


S = 335 + 17 — 25 — 7) = 5. 


The five separations would in fact be 


a, = +1, a = —1; 
A, = +1, G7 = —1; 
Oz, = +1, 7 = —1; 
a, = +1, A, = —1; 
a6 = +1, Q = —1. 


It is clear that the difference between the matrix A and the matrix A, is 
that the positions of P, and P; have been interchanged. 


Example III 


This exemplifies an answer matrix with intransitivities but no separations. 








HAROLD B. GERARD AND HAROLD N. SHAPIRO 45 


Consider 


<a ae 
0 —-1 -1 -—-1 -1 -1 +1 —4 —4 


+1 0 -1 -1 -1 -1 41] -2 -3 
re ee 0 -1 -1 -1 +1 eo 
AAT A AR OO em ek 4 Bd 
Ce Ft EG mt ea 
+1 +1 +1 +1 +1 © -1] 4 -1 
[-1 -1 -1 -1 -1 +1 0] -4 0 
C, 0 -1 -2 -3 -4 -5 4 

X Cn — B) = -5- 8—- 9-8-5 = —35; 


> (k — DR, 


k 








$s 20-5 


—15; 


k 


I 
= 


S = 485 — 35 +15 — 15) 
On the other hand, 


> 


i 


146+4+4+ 16 + 16+ 16 = 72; 


1 
24 (336 — 216) = 5. 


In general, inconsistent answer matrices will have both separations and 
intransitivities. As a measure of deviation from consistency the quantity 


= &A)=S+T 


is suggested. In terms of this measure (since ® = O if and only if S = T = 0), 
a previous theorem provides that A is consistent if and only if (A) = 0. 


Summary and Remarks 


The problem of determining degree of inconsistency within a set of 
paired comparisons has been considered. A definition of consistency was given 
and two fundamental types of inconsistency were defined—namely, in- 
transitivity and separation. The latter is intimately related to an assumed 
a priori ordering of stimuli. Formulas were given which enable the counting 
of the number of each type of inconsistency in a set of data. Proofs of these 
formulas were also provided. It can be shown that separations can occur 











46 PSYCHOMETRIKA 


without intransitivities and vice versa. In general, however, inconsistent 
data will contain both separations and intransitivities. 

The criteria of consistency developed in this paper is made up of two 
components: intransitivities and separation errors. The counting of separation 
errors is appropriate only where an a priori ordering is assumed. However, 
S may violate the assumed order, and re-order the stimuli in a way which 
appears consistent to him. One may call a set of responses relatively con- 
sistent if there exists some ordering relative to which the responses are con- 
sistent, i. e., there are no intransitivities. It is in fact easily seen that a set of 
responses is relatively consistent if and only if there are no intransitivities. 

Implicit in the model is that the stimuli are thought of as being pre- 
sented simultaneously. If one is interested in the effect of order of presentation 
upon choice and consistency, a modification of the method may be made. 
One could consider each stimulus as being a composite of the original stimulus 
with its order of presentation, and treat each composite stimulus as if it 
were presented simultaneously, i.e., as the stimuli were treated in the above 
model. 


REFERENCES 


[1] Gerard, H. B. Some factors affecting an individual’s estimate of his probable success 
in a group situation, J. abnorm. soc. Psychol., 1956, 52, 235-239. 
[2] Kendall, M. G. Rank correlation methods, (2nd ed.) New York: Hafner, 1955. 


Manuscript received 12/10/56 
Revised manuscript received 5/13/57 











PSYCHOMETRIKA—VOL, 23, No. 1 
MARCH, 1958 


PROPERTIES OF THE ITEM SCORE MATRIX 


Aneus G. MacLEAan 
CALIFORNIA TEST BUREAU 


A method of deriving from the item score matrix all the usual statistics 
describing the performance on a test of a group of examinees is given. Since 
this matrix usually is not actually written out, but is implicit in a set of 
punched cards, a method of working from a more compact matrix F is de- 
scribed. A numerical example is presented. Appioticons and advantages of 
the method are cited, as compared with that of recording only the examinees’ 
test scores and the item difficulties. 


Equally Weighted Items 


An item score matrix (X) is an N by n rectangular matrix with elements 
X,; all of which are are either 1 or 0. Each row of (X) is a row vector (X,), 
which lists the item scores of student s. If items are to be weighted equally 
the sum of the elements of (X,) is be X,; = X, , the test score of student 
s. The sum of the test scores of all students in the sample is 


N 
(1) nas * 2 aka fT, 
the sum of all elements of (X). 
The column sums of (X) are of interest since 


(2) ae 


the number of students responding correctly to item 7. 

The square of the test score for student s is obtainable by premultiplying 
the row vector (X,) by its transpose, a procedure which yields a square 
symmetric matrix of unit rank: 


(3) (X2) = (X,)'(X,). 


The sum of all elements of this matrix is X? . 

Some of the operations to be discussed lead to scalar values, others to 
matrices, the sums of whose elements are those values. For the purposes of 
clarity, therefore, all symbols for matrices are enclosed in parentheses, while 
symbols not so enclosed will denote numbers. 

The elements of (X,)’(X,) are the products X,;X,; for student s. There- 
fore 


(4) Rie Je Reka: 


47 








48 PSYCHOMETRIKA 


In general, the square of a sum may be obtained by squaring the row vector 
whose elements are the sum’s components, then summing the elements of 
the square matrix so obtained. 

Summing (4) over the N students gives 


N 
(5) UX= LL LV XXy = 8. 

s=1 8 ‘ i 
S is also. obtained by summing the elements of a square symmetric matrix 
(S) obtained by 
(6) (S) = (X)’(X). 
It could also be obtained by adding the N matrices (X?) obtained by (3), 
that is, 

N 

(7) (S) = &)'®) = 2 (%)"(X). 


The side elements of (S) are the cross-product sums S,; of the columns 
of (X), while the diagonal elements S;, are the result of multiplying the columns 
by themselves. That is, 


(8) S; 


ae Xie ’ 
(9) Si; said : ® XiX; . 


T and S always denote summation over the N individuals in the sample. 
They are the statistics used in calculating standard deviations and corre- 
lations, as follows: 


VL; 


(10) a ates i a ? 


(11) "33 = | 


in which 
(12) L; = NS; — T; , 
(13) L;; = NS;; — T;T; . 
It so happens, when scores are either 1 or 0, that 
(14) S,=T;, =f, 
and 


(15) Si; = fii, 





stor 
3 of 


trix 


ans 
Ans 


yle. 





ANGUS G. MACLEAN 49 


where f; denotes the number of students scoring 1 on item 7 and f,; the 
number scoring 1 on both 7 and j. In other words, counting may be sub- 
stituted for adding and multiplying; the matrix (S) obtained by the opera- 
tion (X)’(X) is identical with the F (frequency) matrix described in a recent 
paper on item selection methods [1]. This matrix (F) can be easily obtained 
by IBM machines. It should be remarked that (11) yields phi coefficients 
when scores are dichotomous. 

A procedure has thus been given for obtaining the usual descriptive 
statistics from the matrix of item scores. In addition such a matrix will 
yield a great deal of other information which a list of test scores will not. 
From (X) itself the item difficulties (and, of course, their mean and variance) 
may be obtained as well as the item variances, test scores, and the sum of 
test scores of those responding correctly to any item. This last statistic is 
useful in item selection and may be considered as the product of column 7 
with the column of row sums, i.e., 


(16) Sit = 2, XX, » 


From (X)'(X) we can obtain the same information plus interitem and item- 
test (point biserial) correlations, Kuder-Richardson reliability estimates, 
etc. The relevant formulas and item selection procedures are discussed in [1]. 


Differentially Weighted Items 


Consider now the more general case of differentially weighted items. 
The foregoing discussion and reference deal with the special case in which 
every item is given a weight of unity in the general formula for a test score 
composed of a linear sum of weighted item scores: 


(17) Xow = WX + Weg $e + WAX an - 
In matrix notation (17) is equivalent to 

(18) Xsw = (X,)(W)’, 

where (W) is the row vector of item weights: 

(19) (W) = (wi, We, ++ *, Wa). 


If a matrix of weighted item scores is desired, perform the operation 
(X)(D.,,), where (D,,) is a diagonal matrix with elements w, , w, , etc. This 
leaves the rows unsummed, whereas (X)(W)’ sums them. The following 
operations yield the results indicated: 


(X,)(Dw) = row of weighted item scores for student s. 
(X,)(W)’ 
(X)(D.) 


weighted test score of student s; sum of elements of (X,)(D.). 


N by n matrix of weighted item scores. 








50 PSYCHOMETRIKA 


(X)(W)’ column of N weighted test scores; sums of rows of (X)(D,,). 


(D.,)(F)(D,) = square symmetric matrix of order n exhibiting the weighted 
S,; and S,; values, i.e., sums of squared weighted item scores 
and sums of their cross products. This is the matrix (S,,). 


(D.,)(F)(W)’ = column of the n values of S;, ; row sums of (D,,)(F)(D,). 


(W)(F)(W)’ = S.,,, the sum of squared weighted test scores. This is the sum of 


all elements of (D,,)(F)(D,,). 


T,, may be obtained by summing the elements of (X)(D,,) or (X)(W)’, 
and the standard deviation of the weighted test scores will be 


(20) a». = VNS, — T2/N. 
If the squares of the individual weighted test scores are desired they 


may be obtained by (W)(X,)’(X,)(W)’ or by summing the elements of 
(D.,)(X,)'(X,)(D.,), but it would be easier to square individually the elements 
of (X)(W)’ already obtained. 

Of course, the column sums of (X)(D,,) are f;,, = w,f; and the row vector 
of these is equal to (f;)(D,,). The weighted S; and S;; in (D,)(F)(D,) are 


equal to 


2y2 
Siw = s&s WX si 
& 


(21) 
= wiS; ’ 
and 
Site = ¥ WX g5WjX vj 
(22) ; 


= ww Si; . 


The foregoing techniques are applicable where item analysis is to be 
performed on a test composed of weighted items. Alternatively, if item 
scores had been punched 1 or 0 and it was subsequently decided to weight 
the items differentially, the mean, variance, and reliability of the revised 
version might be determined by these techniques. The procedure would 
employ the original F matrix, if F had been determined initially, or would 
generate F, and then apply the weighting matrices (W) or (D,,) to produce 
the desired information. 


Illustrative Example 
Suppose that five students* made the following scores on a set of four 
*This N is chosen purely for illustrative convenience. In practice a representative 


sample of 200 or more cases is recommended to ensure greater reliability of the statistics 
derived, 





sp 


ed 
"eS 


2y 
of 
ts 


or 
re 


i=) 


CD fee Se Se te 


m 








ANGUS G. MACLEAN 51 








items: 
ae 
E ho S 
Y £2 14:4 
(he: 18 2 oe Be 
ke Ga oe: Ae 
Soe ans eee ee 
, } 3 2 a= 2.3 = leet 
p; 60 .60 .60 .40 
Me. 
wet. Ss 4 7 
() o Mama =|) FF 1) 7 
=  ) 
e.1.b as 








(This can be checked by squaring the row sums of Y.) Usual statistics: 
X =T/N =2.2. Also, X = > p;. 
t=1 


= T/nN = .55. 
= (NS — 1)/N? = 14/25 = .56. 


me 7 = Pi) £ (1 — S81) _ _ 95. 


K Roo = —( NS —T" 3 id 


Kuder-Richardson formula 20 is an index of item homogeneity; a negative 
value indicates a tendency for the items to be negatively intercorrelated. 
Inspection of (X) confirms this. To obtain the phi coefficient between items 
1 and 4, 











ai Sia Nhu — fife 
V Nf; eg fi V Nfs <a fi 
—1 
~ V6 V6 


= —.17. 











52 PSYCHOMETRIKA 


In many situations (but usually not in item selection) an Z matrix is derived 
from (S), with side elements Z;; = NS;; — 7;T; , and diagonal elements 
L; = NS; — T? . The side elements are the numerators of the correlation 
coefficients, the denominators are the geometric means of the appropriate 
diagonal elements. In matrix notation 


(23) (L) = N(S) — (7)'7), 


where (7') is a row vector containing the sums of scores on each variable 
and NV is, of course, a scalar. In the case of items scored 1 or 0 this becomes 


(24) (L) = NF) - A’, 


where (f) is the row vector of item frequencies (number of students scoring 
1 on each item). Then, in the example, 











1 5 10 5 [9 9 9 6] 6 —4 jw) 
() = 5 15 10 5) _|9 9 9 6]_|-4 6 1 —1} 

10 10 15 5 9996 ; { wf 

l 5 5 5 10 16 6 6 4] im me Og! 


It is evident from (Z) that four out of the six interitem correlations are 
negative. It may be noted that the Z matrix may be converted into an item 
covariance matrix by dividing every element by N’. 

Now suppose that it is desirable to apply a set of weights to the items 
as follows: 


Item Number 1 2 3 4 
(W) = (3 3 5 1) 
Then: 


Row sums of (X)(D,,) 
= (X)(W)’ = } = 2, Wi X45 ° 


i. 8 
3001 4 
(X\(D.) =|0 3 5 1 9 
0300 3 
3 35 0. ll 








35 = T, 





US 








ANGUS G. MACLEAN 53 


and 


Row sums = (D,,)(F)(W)’ 


[27 9 30 3 69 
OIG =p FOR 2 69 
30 30 75 5 140 
Lat & Fe _13 








291 = (W)(F)(W)’ = S,,. 
X, =T./N = 7.0, 
o2 = (NS, — T2)/N’ = 230/25 = 9.25. 


REFERENCE 


[1] MacLean, A. G. and Tait, A. T. Some computational short-cuts in the development 
or analysis of tests. J. appl. Psychol., 1954, 38, 260-263. 


Manuscript received 3/6/57 
Revised manuscript received 5/16/57 














PSYCHOMETRIKA—VOL, 23, No. 1 
MARCH, 1958 


THE COUNSELING ASSIGNMENT PROBLEM* 


Jor H. Warp, Jr. 


AIR FORCE PERSONNEL AND TRAINING RESEARCH CENTER 


A disposition index, DI, which provides information about each possible 
lacement to be considered in a personnel classification situation is discussed. 
he index is readily computed by machine methods and can be used by 

counselors required to make assignments. The use of the disposition index 
— an adequate approximation to optimal solutions obtained by other 
methods. 


The personnel classification problem has been discussed previously by 
several authors [1, 2, 4, 5]. This problem has been shown to be similar to the 
Hitchcock-Koopmans transportation problem, which is a special case of 
linear programming [6]. The techniques presented in the following discussion 
have a direct analogy to the problem of a transportation scheduling supervisor 
who is responsible for transporting products from several origins to several 
destinations in an economical manner. 

The problem of assigning personnel to jobs generally has been stated 
as follows [6]: Given 7 persons to be assigned to n jobs and the productivity 
of the 7th person on the jth job, find an assignment of persons to jobs such 
that total productivity is a maximum. A solution to this problem can be 
determined by linear programming techniques [2, 3, 6]; if the problem is 
not too large, the assignments can be determined by automatic methods 
without the intervention of counselors. This problem is of particular concern 
in military and large industrial personnel assignments but is not closely 
related to individual vocational guidance. 

A major difficulty with this approach to the problem is that the pro- 
ductivity values are generally only crude estimates of the value of a person 
on a job. Consequently there is still need for intervention by counselors 
to account for unforeseen significant information. An additional problem 
in the use of a completely counselor-free assignment procedure is that it is 
quite difficult to sell, operationally. This is probably due, in part, to the 
drastic, noticeable system change brought about by conversion from the 
old to the completely automated system. 

A reasonable approach indicates continuing the present counseling 
systems and providing increasingly valuable assignment information that 

*This report is based on work done under ARDC Project No. 7702, Task No. 17051, 
in support of the research and development program of the Air Force Personnel and Training 
Research Center, Lackland Air Force Base, Texas. Permission is granted for reproduction, 


translation, publication, use, and disposal in whole and in part by or for the United States 
Government. 


55 











56 PSYCHOMETRIKA 


will lead to the optimal solution. Continuous gradual improvement of the 
information supplied to the counselor assignment process will result in more 
effective assignments. The procedure may ultimately converge to an auto- 
matic system—human intervention decreasing with increasing adequacy 
of productivity information. This procedure will have the advantage of 
gradual implementation—leading readily to acceptance because of minimum 
interference with existing procedures, and more adequate utilization of 
personnel. The following material will include a description of a placement 
or disposition index which can fit into a counselor assignment system. 


A Counseling Assignment Problem 


Consider the problem of assigning n men to n jobs given the productivity, 
c;; , of the 7th man on the jth job. In the counseling situation it would be 
desirable to have information (perhaps represented by a single index) associ- 
ated with each possible placement that would reflect characteristics of the 
entire c,; array. In order to consider the relative merits of particular place- 
ments, a counselor should have not only an individual assignee’s productivities 
(as indicated by an aptitude score, achievement score, or some other measure) 
but also an indication of the productivities of all other personnel to be plaéed. 

Assume that an individual counselor is required to assign three men to 
three jobs, and suppose the productivity index matrix is as follows: 


Jobs 
pee AS Re 
ae 3S 
Persons 2/)}5 1 O 
3/6 4 1 


Assume further that the counselor can see only one man’s productivities (or 
perhaps test scores) at a time and that he adopts the policy of placing a man 
in an available job in which he has the highest productivity. If the men come 
to the counselor in the above order, the assignment would be as follows: 


Person Job 


1 1 
2 2 
3 3 


The first man’s highest index is on job one; the counselor will therefore 
place man one on job one. There are then two jobs remaining; since man 
two has a higher productivity on job two than on job three, he will have 
job two. Finally, the third man will be placed on job three. This sequence 
was selected as an example because it would provide the lowest possible 











JOE H. WARD, JR. 57 


sum, Ci; + C22 + C33 = 10, and therefore would be considered the worst 
assignment. The maximum sum, C2; + C32 + ¢i3 = 15, would have resulted 
only if the men had entered in the sequence 2, 3, 1. If there had been a 
completely automatic system which would give the optimal assignment, 
Cig + Cor + C32 = 15, all would be well if there were no possibilities of ad- 
ditional information about productivities. 

Assume now that the counselor has determined (before talking with the 
men) the optimal assignment and feels confident of his position. When man 
one enters, the counselor plans to place him on job three where his pro- 
ductivity is six. However, after further investigation, the counselor finds 
it is impossible to make this placement; for lack of a second recommended 
placement, the counselor places the first man on job one (productivity equal 
to 8) in his effort to maximize the assignment sum. It is now apparent that 
the counselor is on his way to making the worst placements again and will 
be forced into the minimum assignment sum ¢,, + C22 + ¢3; = 10. 

Even though this example is made to demonstrate the worst situation, 
it is still apparent that it would be desirable to provide the counselor with 
information reflecting the relative merits of each placement. The disposition 
index, DI, that is to be developed should provide this type of information 
and should be expected to result in efficient assignments at small compu- 
tational expense. 


Development of a Disposition Index, DI 


Consider, first, placing the person p on the job g. Having made that 
placement, assume that all possible assignments are made and that each 
assignment of the n — 1 persons is equally likely. Then there are (n — 1)! 
possible sums containing c,, and the probability associated with each is 
1/(n — 1)!. 

Now consider the mean value, E(S,,), of the assignment sums containing 
Cpq » and consequently the mean value, E(s,,) = E(S,,)/n of the productivities 
contained in the (n — 1)! sums involving c,, . Having selected the value c,, , 
the sums contain only elements from the (n — 1) remaining rows and columns 
of the c;; array. Now each element, say c,, , of the resulting square matrix of 
order (n — 1) is contained in (n — 2)! of the (n — 1)! sums. Therefore it 
follows that the mean value E(S,,), and consequently E(s,,), are obtained 
as follows: 


E(S5.) a [(n oa 1) Ic, + (n = 2) Ic. G3. ee + Cra) |/(a aR 1)! ’ 


where 


n 
C= a. ; So 2, Ce » CG 2, tit 


t=1 j=1 i=1 t=1 








58 PSYCHOMETRIKA 


E(S,¢) = [(n a ley sot Ale Cp. — Cig + Cyql/(n i Ds 


Pils — Sy. — Ca t c..J/m— I), 


(1) 


and, dividing by n, 
(2) aes | eee —¢. —C,+ec..] 
4\“pa nm Pa ~ n(n a 1) Una D. -@ Ae 
Now consider the mean value, E(S;z), of the sums not containing c,, ; 
consequently, the mean value E(s;) = E(S3z)/n of the productivities con- 
tained in sums not involving c,, . There are (n — 1) (n — 1)! such sums, 
and the values of H(.S;z) and E(s;z) are obtained as follows: 


1 
(n — I(n — 


(3) ame (n va 2) \e.. Bens, “+ Coa) | 





= = ie ~ Be.) + ¢,. +¢,, — ne], 





and, dividing by n, 


1 


(4) (sz) = * E(Sz) ee Oy [@ — Qe.. +e. +6. — ney]. 


Now consider the difference D,, = E(S,,) — E(Szz), between the mean 
sum obtained when placing the pth person on the gth job and not making 
that particular placement. From (1) and (38), 


D,, = ea [Nope — Sp. — C0. +C..) — aa [(m — 2)c.. + ¢,. 


+ C.q — Nya] 


(5) = oer [n(n — ley, + (n — Ie.. — (n — IG. +...) 
as (n ioe 2)c.. bac (Cy. + 0.4) + NCyq] 
= aw [n'y = (Cy. + tid + aif, 


and, dividing by n, 


d,, = Ele.) — Ex) = + H(8,.) — 1 ES) = D,,/n 
n n 
(6) 
a [nCyq ae N(Cy. + c:,) + 61. 


i n(n — 











JOE H. WARD, JR. 59 


The value D,, represents the difference between the mean value of the 
assignment sums involving c,, and the mean value of the assignment sums 
not involving c,, . The value d,, represents the difference between the mean 
value of the productivities contained in assignment sums involving c,, and 
the mean value of the productivities contained in assignment sums not 
involving c,, . It is apparent then that as the value of d,, or D,, increases 
the placement of person p on job q is more likely to result in a larger assign- 
ment sum. 

Some interesting properties of these equations are the following: 


(7) > H(S..) - > ES) = > E(Sr: - > BS) a 


t=1 
n 


(8) > Ee.) = i E(@,,) = p> EG) = 3 (6) = ¢../n; 


t=1 


consequently, 


(9) 2d, Dia = 2 Dy = Di dia = Qi dy; = 0. 

o> = ‘= j= 

This indicates that the values of D,, and d,, are in a type of deviation 

form simultaneously by rows by columns. Putting the c;; matrix in deviation 

form by rows (or columns) first and then in deviation form by columns (or 
rows), the deviational form, 6,, , becomes 


(10) i, = 4 [n7ep. — n(cp. + ¢..) +¢..]. 


Therefore it can be seen that 6,, , obtained by putting the c;; matrix in 
deviation form by rows and columns, differs from D,, only with respect to 
the factor 1/n’, whereas D,, involves the factor 1/(n — 1)’. 

Since it is frequently desired to assign m persons to n jobs, where m 2 n, 
consider the expression for D,, and d,, under these more general conditions. 


E(Sy.) = a — 1)! E — a on + ba (¢..—G. —Cegt on) | 








m — 1)! L(m — n) (m — n)! 
where 
C¢.. = ) ee ae Ce By the is Co. = 2a 
i=l jel i=1 j=1 
E(S,.) = m L 1 [(m a Dbse i PS Oyj Coe ss Coa] 
(11) 


mu] lmera — Sp. — Ca + ..], 











60 PSYCHOMETRIKA 


and 


(12) BG) = 2 E(8,) = Fy lin — &. — Oa + 6..]; 





nim — 
oy  _(m—n)! E — 1)! _ (m— 1)! 
E(S5) = (m — 1)(m — 1)!L(m—n)!" —  (m—n)! 7 
_—(m=-2)!, | 
(13) pages C.. — &. — Ce +) 
1 
= (mn — D? [(m — 2)e.. + cy. + €.¢ — Mey], 
and 
(14) E(sz) = * B(S5) = es [(m — 2)e.. + Cy. + 6.4 — Meza]. 
Then the difference D,, = E(S,,) — E(S;) leads to the expression 
1 2 
(15) Dye = Ga — 1? [m'c,, — m(e,, +¢.,) +¢..]. 


Dividing by n gives 


(16) dy, oe 


al 


. ; [m’c,. — mc. + ¢..) + ¢..]. 


1 ca: eee 
n num—1 


It is important to notice the similarities to the several expressions pre- 
viously developed. We can write 


E(S,.) mars K, [neyq =, C.¢] + k, ’ 
E(S5) = ks[nepo si rca C.4] + ky ’ 
C.¢] + Ke ’ 


dyq = kz[Neyq — Cp. — C.e] + ks - 


Dae is ks[neyq = hp: 


Thus if the magnitude of any of these indices is used as a basis for 
assignment, then the value 


(17) Ope = Nyq — Cy. — Cig 


will provide all of the distinguishing information among possible placements. 
The easily computed index ¢,, provides a large amount of information con- 
cerning the array of productivities. 

There are several possible indices from which a disposition index, DI, 
may be chosen; the one probably most meaningful to the counselor is (2), 











JOE H. WARD, JR. 61 


E(s,,). This index is the mean value of the productivities contained in all 
possible assignment sums involving c,, . It is directly related to the pro- 
ductivities, and it has the same interpretation for any value of n. For the 
more general case of m persons assigned to n jobs, where m 2 n, E(s,,) is 
given by (12). It is therefore suggested that the disposition index DI,, be 
defined by (12): 


1 


(18) DI,, = n(m — 1) [MeCpq 


ee Clg ©: 
where m = number of persons to be assigned, 

m = number of jobs to be filled, 

Cyq = productivity of the pth person on the qth job, 


n m m n 
Co. = 2 os ces ems Hee bes oe. za a aie . 
ym 


t=1 t=1 j=1 


The Disposition Index in a Counseling Assignment System 


The disposition index, DI, reflects the relative merits of making a 
particular placement based upon information about the entire productivity 
array. The first step in using the DI would be to compute the entire matrix 
of DI,, ; that is, compute DI,, for every person on every job. If the entire 
DI matrix is available, placement could proceed by placing the largest DI 
first, next largest second, and so on until all placements have been made. If 
elaborate data processing equipment is available, the DI matrix can be 
computed after each placement to reflect the change of conditions. This 
should tend to provide an assignment sum that is very nearly optimal. In 
any case, the reduced matrix of DI can be computed after, say, every ith 
placement with the frequency of updating determined by the speed of avail- 
able computing facilities. In actual operation, it would probably be desirable 
to update the DI matrix at the end of each day and at the same time distribute 
to counselors the DI’s of the personnel. to be placed the following day. 

Consider the application of DI’s to the simple problem presented pre- 
viously. The productivity array, complete with row and column sums, is: 








Jobs 
rie che Oe ae 
1 es ee 
Persons 2 a>. t 91 © 
S40 4:50 
c..119 12 7| 8 =—c., 














62 PSYCHOMETRIKA 


It is now possible to compute the DI matrix. 
DI ee. —¢,. —C,tc..] 
aS 3(2) OVpa Dp. -@ os¥* 


DI,, = $[3(8) — 21 — 19 + 38] = §[24 — 21 — 19 + 38] 
= }(22] = 22/6. 
DI,. = 3[3(7) — 21 — 12 + 38] = §[21 — 21 — 12 + 38] 
= $[26] = 26/6. 
DI,; = ¢[8 — 21 — 7 + 38] = ¢[18 — 21 — 7 + 38] 
= $[28] = 28/6. 
The complete set of DI’s is obtained by similar computations. 


Jobs 
3 
1 2 3 2 Wiles 
i=1 





1 | 22/6 26/6 28/6 38/3 
Persons 2 | 28/6 23/6 25/6 38/3 
3 | 26/6 27/6 23/6 38/3 











3 
> DI., | 38/3 38/3 38/3 | 388 =c.. 
i=1 


In this problem the three highest DI’s can be selected and the indicated 
placements made. Man one would be placed on job number three, man two 
on job one, and man three on job two; this would result in the maximum 
suM C13 + Co; + Cao = 15. 

Notice what the counselor would do if man one could not, for some valid 
reason, be placed on job three. The counselor would not place man one on 
job one as indicated by his highest productivity but would place him in job 
two where his second highest DI is located, DI = 26/6. After making this 
assignment, the counselor would continue to fill the jobs according to values 
of the disposition index. The result would be an assignment that has the 
second highest possible value ¢c;. + ¢€2; + ¢33 = 13. 

The next example is selected to demonstrate when the procedure will 
not give a maximum assignment sum if only one DI matrix is computed. 
Consider the productivity array shown below. 








JOE H. WARD, JR. 63 








Jobs 
| Cy ae Oe 
fis & Ot Ss 
Pocus Bil 7 6] 14 
Se 7 Ft 
c,|6 18 138 | 37 =c.. 








The DI matrix is: 


3 
1 2 S$ 2 i, 


t=1 


1 | 29/6 26/6 19/6 37/3 
Persons 2 20/6 26/6 28/6 37/3 
25/6 22/6 27/6 37/3 








3 

> DI,, | 37/3 37/3 37/3 | 37 =c., 

t=1 

The three highest values of DI,, are DI,, = 29/6, DI.; = 28/6, and 

DI,,; = 27/6. Since DI,,; and DI,; involve the same job, if man one is placed 

on job one, and man two is placed on job three, then it will be necessary to 

place man three on job two. This assignment will result in a sum which is 

not optimum, ¢c;, + C2; + C3. = 14. However, if after placing the first man 
.on job one, a new DI matrix is computed, an optimal sum will result. 


Jobs 
3 
2 3 >, Di,, 


i=2 











14/2 13/2 27/2 
3 | 13/2 14/2 27/2 


Persons 








3 
2, Digg | 27/2 B/t | = .. 


i=2 





From the new DI array it is clear that man two should be placed on job 
two and man three on job three. This would result in the maximum sum 
C11 + Coz + C33 = 15. 

Now consider a much larger assignment problem which involves assign- 
ment of three different kinds of people to five different kinds of jobs. The 








64 PSYCHOMETRIKA 


following array presents, rather than productivities, values which might 
represent the cost of having a person type in a particular type job. The 
matrix is bordered by the frequencies of men available and jobs to be filled, 
as well as by row and column totals. 


Job Types 
Persons 
1 2 3 4 5 Available c,. 


1 57 60 55 54 62 40 13940 





Person 9 53 52 50 59 51 80 12890 
Types 


3 58 63 61 56 64 120 14550 





Job Quota 10 20 30 80 100 240 








c., | 13480 14120 13520 13600 14240 | 3,334,800 = c.. 


A solution based upon such a cost matrix requires a minimization rather 
than a maximization process. Consequently, it will be necessary to select 
the smallest values of the DI matrix. From the marginal totals it is then 
possible to compute the DI matrix. 


3,321,060 3,321,140 3,320,540 3,320,220 3,321,500 
| 3,321,150 3,320,270 3,320,390 3,322,470 3,319,910 
3,320,690 3,321,250 3,321,370 3,320,090 3,321,370 


1 
ie) 57 ,630 


Starting with the smallest value and placing the personnel in ascending 
order of DI, the following minimum sum assignment is obtained: 








Job Types 
Persons 
eee ee eer 5 Available 
1 10 30 40 
— 80 80 
ypes 
3}10 10 80 20 120 
Job Quota 10 20 30 80 100 240 








The sum associated with this assignment is 








JOE H. WARD, JR. 65 


10(60) + 30(55) + 80(51) + 10(58) + 10(63) + 80(56) + 20(64) = 13,300. 


This example provides an optimal sum without recomputing the DI matrix. 


Other Possible Disposition Indexes 


It is possible to consider the variances associated with the expected 
sums and obtain more information about the distribution of sums associated 
with each possible placement decision. The variances can be easily computed 
by machine methods and might be incorporated into a useful disposition 
index. 


REFERENCES 


[1] Brogden, H. E. An approach to the problem of differential prediction. Psychometrika, 
1946, 11, 139-154. 

[2] Dwyer, P. S. Solution of the personnel classification problem with the method of optimal 
regions. Psychometrika, 1954, 19, 11-26. 

[3] Dwyer, P. S. The detailed method of optimal regions. Psychometrika, 1957, 22, 43-52. 

[4] Thorndike, R. L. The problem of classification of personnel. Psychometrika, 1950, 15, 
215-235. 

[5] Votaw, D. F., Jr. Methods of solving some personnel-classification problems. Psycho- 
metrika, 1952, 17, 255-266. 

[6] Votaw, D. F., Jr. and Dailey, J. T. Assignment of personnel to jobs. Research Bulletin 
52-24, Air Training Command, Human Resources Research Center. Lackland Air 
Force Base, August, 1952. 


Manuscript received 5/2/57 
Revised manuscript received 7/29/57 




















PSYCHOMETRIKA—VOL. 23, No. 1 
March, 1958 


A RETEST METHOD OF STUDYING PARTIAL KNOWLEDGE 
AND OTHER FACTORS INFLUENCING ITEM RESPONSE* 


Vera T. BROWNLESS AND JoHN A. KEatst 


AUSTRALIAN COUNCIL FOR EDUCATIONAL RESEARCH 


A method of studying the problem of correction for guessing and other 
problems associated with behavior in the test situation is described and an 
illustrative example presented. As far as the writers are aware this method of 
approach is novel but, at the same time, it covers many of the practical and 
theoretical points raised by other writers as reviewed in the introduction. 


Awareness of some of the problems involved in tests which are presented 
in multiple choice form has existed since the early days of testing. One of 
these problems is based on the fact that the test items can be answered 
correctly by a person with no knowledge in the field being tested. By purely 
random selection from the alternatives presented in each question, such a 
person may obtain a nonzero score on the test. An individual may obtain 
any score from all correct to none correct, although results for a large group 
of such persons are expected to yield a group mean which is equal to (total 
number of questions)/n, where n is the number of choices in each question. 

Previous workers attacked this problem in various ways. Many, recog- 
nizing that guessing goes on to a greater or lesser degree whatever the instruc- 
tions, have recommended some form of correction for guessing. In opposition 
to the idea of making some form of correction, a number of people, in partic- 
ular Holzinger [4] and Gulliksen [3], have noted that, provided all students 
answer all questions, the correction factor makes no difference in the rank 
order of the students. Stanley [7] suggested that although no benefit is 
derived from the correction when the number of omits varies little from one 
student to another, the students’ attitudes to the testing situation may be 
improved. 

It is doubtful that any over-all guessing correction factor improves the 
reliability of the test. It is doubtful that all students are guessing from the 
same number of alternatives; in fact it is quite possible that the more able 
students can eliminate some of the choices and are therefore guessing among 
fewer choices. This problem was considered by Horst [5, 6]; he produced a 
formula that allows for elimination of some choices by some of the students. 
However, as Davis [1] points out, although this formula allows for partial 


*The authors wish to acknowledge help received during discussions with Dr. Frederic 
M. Lord and Professor 8S. 8. Wilks. : 

{Present address, Psychology Department, University of Queensland, Australia. 
The death of Mrs. Vera T. Brownless on May 16, 1957, is regretfully announced. 


67 











68 PSYCHOMETRIKA 


knowledge it does not make allowance for wrong answers which are based on 
misinformation. Davis [1] suggests that when a correction formula is used 
it leads to overcorrection if an examinee has misconceptions, undercorrection 
if he has partial information, and that these two influences tend to cancel out. 

One difficulty in discussing guessing is to find a suitable definition of 
guessing. In this work the authors are using the one given by Granich ‘The 
tendency to answer questions which are unrecognized either wholly or in 
part, when an answer can not be deduced with certainty from such information 
as the student possesses’’ ([2], p. 155). Here no assumption has been made 
that an n-choice question actually presents n choices to the student. A 
student with some knowledge may be able to eliminate some choices and 
thus narrow the field ton — 1, n — 2, --- , or even 2 choices. 


Method of Investigation 


To obtain the empirical data for this method it is necessary to administer 
the same test to a group of subjects on two occasions. The time between 
administrations should preferably be short, and no warning should be given 
to the subjects that they are going to be retested. If the responses of the 
subjects to a particular item are examined on the two occasions they will be 
found to fall into one and only one of the ten categories listed in Table 1. 
The number of subjects in each category can readily be obtained and these 
numbers pooled for all items. For example, 7',, denotes the number of times 
any item was marked correctly at both administrations by any subject. 


Analysis of the Data 


Detailed observation and questioning of subjects while they are taking 
the tests would probably suggest a large number of factors operating to 
produce a given response category. For the present, rather simple assumptions 
will be made, not because they are thought to cover all or even the majority 
of cases, but to facilitate the description of this method of approach. The 
possibility of testing these assumptions on the same data should not be 
overlooked and will be referred to again. It should be noted that the general 
method of analysis suggested here will not only be useful in investigating 
the problem of correction for guessing but might well provide an objective 
method for examining certain factors thought to influence test performance. 
A simple set of assumptions is given below. 

1. At the first administration, all responses are either known 
correctly, guessed, or ‘‘known”’ incorrectly. 

2. At the second administration, all responses are either known 
correctly, guessed, ‘known’ incorrectly, or repeated from 
memory. 

3. No person who knew the correct answer at the first adminis- 
tration will guess at the second. 

















VERA T. BROWNLESS AND JOHN A. KEATS 69 
on TABLE I 
sed 
rm Possible Response Categories 
ut. 
of Type of Response Number of 
he Category to an item on two cases in the 
in occasions category 
ion 
ide 
1 ight ight ali 
A right x rig ae 
-" 2 igh a 
right x wrong <a 
3 wrong x right Wag 
ier 4 wrong xX same wrong T 
en a 
en 5 wrong x different wrong z 
he wee 
be 6 omit x right 7. 
i, 
se 7 omit x wrong : Fim 
- 8 omit x omit é & 
oo 
9 right x omit yl 
ro 
Ig 10 wrong x omit ‘E 
wo 
0 
1s 
y 
e 
e 4. No person will learn an incorrect response between adminis- 
1 trations. 
g The probability that a person who guesses will guess the right answer 
e is regarded as unknown but constant for the persons and items under con- 
. sideration in the sense that an average figure is required. Obviously sub- 
divisions of items or people or both can be examined separately if sufficient 
data are available and the corresponding average probabilities for sub- 
p 
groups compared. The problem is to estimate this average probability. 
Notation 
1/k = the probability of success by guessing. 
s = the number of occasions subjects know the correct answer at 
both administrations. 











70 PSYCHOMETRIKA 


¢ = the number of occasions subjects guess at the first adminis- 
tration and know the answer at the second. 

u = the number of occasions subjects guess at the same item at 
both administrations. 

m = the number of occasions subjects guess at the first adminis- 
tration and repeat the same response from memory at the 
second. 

x = the number of occasions subjects ‘““know’’ the same incorrect 
answer at both administrations. 

y = the number of occasions subjects “know” an incorrect answer 
at the first administration and know the correct answer at the 
second. 


Using this notation as well as that of Table 1 with the assumptions made, 
it follows that 7’,,, , the number of occasions subjects gave the correct answer 
on the first occasion and an incorrect answer on the second occasion will 
equal the product of u, 1/k, and (k — 1)/k, when the last term is the prob- 
ability of guessing a wrong answer the second time. Thus, 














(1) T = (k — Yu. 
k 
In a similar way the following four equations can be derived. 

: t U m 

(2) oo ee a ee 
_(k— Vt , (k— Iu 
(3) a itd k + k? + y 
(4) “a (k i l)u 4 (k eS 1)m x 
k k 

5) Tee, = EUG Pu. 
From (2) and (5), k and u can be estimated. 
(6) k= 5 + 2. 

7 eee OY Ay 
” "EF 1 ee 


Although the remaining constants cannot be estimated from the data, 
it is clear that the difference 7’, — T',,, is related to the amount of learning 
during and between the testings, and the difference 7’,,. — 7’... is related to 
the extent of fixation on a particular wrong response. If the material in the 





t 


j= 


t 


_—=_—-— 





VERA T. BROWNLESS AND JOHN A. KEATS 71 


test is of an unfamiliar nature it might be safe to assume that there is no 
prior knowledge and thus that x and y are both zero. In this case s, tf and m 
can be obtained explicitly with the following result: 


(8) s=T,, — (Tor + Tow — Trv)/(k — 1), 
(9) t= (Tor — Trw)k/(k — 1), 

and 

(10) m = (Tow — Tr)k/(k — 1). 


A second estimate of k can be obtained by considering patterns of re- 
sponses involving the omission of a response to an item at either or both 
testings. 


Notation 
z = the number of occasions a person omits at the first administration 
and knows the answer at the second. 
a = the number of occasions a person omits at the first administration 


and guesses at the second. 

the number of occasions a person omits at both administrations. 
the number of occasions a person guesses at the first administra- 
tion and omits at the second administration. 


A further assumption is required which is in line with the assumptions 
already listed. It is assumed that persons who know the answer at the first 
administration will not omit a response at the second administration. 

With this assumption and the notation already given, it is possible to 
derive five more equations in the way illustrated above. 


(11) T, =2+ = 
(12) Tow = (k — Ia/k. 
(13) To = b. 
(14) Tro = c/k. 
(15) Too = (k — Ie/k. 
The solutions for the unknown quantities are as follows: 
(16) k = 7 +1. 
(17) e= Tig +T oo. 


o~ 
II 


(18) e... 








72 PSYCHOMETRIKA 





int T. 


9) 
(19) an 
(20) z2=T), — FuP, 


With some tests and under certain conditions of administration, the 
total number of times a person omits an item may be insufficient to give 
reliable estimates of the constants. In particular, the estimate of k might be 
based on a relatively small number of cases. This may not be unsatisfactory 
if this estimate is being calculated only as a check on the value obtained by 
the method which does not consider omitted items, but it must be noted 
that in the case of two-choice items this is the only method of estimating k. 

Since the primary interest of this type of investigation is the estimation 
of k, it is important to examine the nature of this estimate. For this purpose 
consider a person who is guessing between n alternatives for a number of 
items. Let k = k,, , where k, is the estimate of k obtained from (6). 


(21) k, — 2 = Ty 00./T 20 


This procedure can be repeated for further groups of items provided 
that within each group the subject is guessing from the same number of 
alternatives. In practice it is not possible to isolate these groups. The method 
outlined above yields an average of the following kind: 


: iy A 
k aw 2 ii Wiw2, 
b Biee, 
It may be difficult to justify this method of averaging over others that 
suggest themselves in theory. In practice this is the type of average that is 
given by the present method, and no more satisfactory method has so far 
been devised for estimating k. 


(22) 


An Illustrative Example 


To illustrate the type of results obtained by this approach, data were 
analyzed for 78 cases from two schools. Each subject had been given two 
administrations of each of two tests with a period of one week between 
administrations. The tests used were a mixed verbal and number general 
ability test (A.C.E.R. Intermediate D) and a nonverbal test involving 
problems with line figures (Jenkins Test). The frequency of all possible pairs 
of responses to a given item was tallied, but as there were very few occasions 
on which an item was omitted, response categories involving an omission are 
not presented. In Table 2 appear the frequencies for the two tests. 

The value of k obtained by applying (22) to these data is 3.6 for Inter- 
mediate D and 3.5 for Jenkins Nonverbal. Thus, although these tests both 
involved five-choice items, the effective number of choices appears to be 





_.- oe a. 


— 2 





VERA T. BROWNLESS AND JOHN A. KEATS 73 


TABLE 2 


Summed Frequencies in Response Categories 
for Illustrative Example 











ST. PT wy rT Lr iia *T ww, Total 
Intermediate D, 1330 183 285 472 293 2563 
Nonverbal 3405 407 939 565 598 5914 





about three and one-half as an average over persons and items. A point of 
contrast between the two tests is suggested by the relatively high value of 
T.»». and low value of 7’, for Intermediate D as contrasted with Jenkins 
Nonverbal. This result suggests that the familiar verbal and number items 
involved more misconceptions and recall of wrong responses than the un- 
familiar items involving classification of line drawings. The latter items, 
however, showed a greater amount of learning between trials. 

It is emphasized that these results are presented to illustrate the method 
and not to prove anything about the tests. The number of cases is not large 
and the time between administrations is longer than would ideally be used. 
However, the results obtained do not appear unreasonable and indicate 
that further studies of this kind would be worthwhile. 


REFERENCES 


[1] Davis, F. B. Item analysis in relation to educational and psychological testing. Psychol. 
Bull., 1952, 49, 97-121. 

[2] Granich, L. A technique for experimentation on guessing in objective tests. J. educ. 
Psychol., 1931, 22, 145-156. 

(3] Gulliksen, H. Theory of mental tests. New York: Wiley, 1950. 

[4] Holzinger, K. J. On scoring multiple response tests. J. educ. Psychol., 1924, 15, 445-447. 

{5] Horst, A. P. The chance element in the multiple choice item. J. gen. Psychol., 1932, 
6, 209-211. 

[6] Horst, A. P. The difficulty of a multiple choice test item. J. educ. Psychol., 1933, 24, 


229-232. 
[7] Stanley, J. C. “Psychological” correction for chance. J. exp. Educ., 1954, 22, 297-298. 


Manuscript received 8/17/56 


Revised manuscript received 4/15/57 








PSYCHOMETRIKA—VOL, 23, NO, 1 
March, 1958 


THE MEASUREMENT OF FUNCTION FLUCTUATION 


R. F. GarsipgE 


UNIVERSITY OF DURHAM, ENGLAND. 


A method of measuring function fluctuation is suggested and an 
appropriate test of significance is indicated. The proposed method is com- 
pared with bi-factor analysis and with some other suggested methods of 
measuring function fluctuation. 


The literature on function fluctuations has recently been summarized 
by Anderson [1]. He considers the various methods which have been proposed 
and concludes that those suggested by Thouless [12] and Finney [6] not only 
give similar results but are the best simple methods. Mahmoud ([9], p. 131), 
however, has stated that Thouless’s index of function fluctuation gives 
results which ‘‘seem far too high.’”’ Moreover, Finney has intimated [4] that 
his paper, which Anderson [1] refers to, was a “hurriedly prepared private 
document”’ not intended for published discussion. 

The accuracy of psychological prediction is limited by the amount of 
fluctuation in the mental function under investigation. The measurement of 
such fluctuation is therefore important. Yet it appears that there is no general 
agreement as to how function fluctuation is best measured—this is the purpose 
of the present paper. 


Definition of Function Fluctuation 


Suppose that a group of people are tested on two occasions, that the 
tests measure a common factor, g, and that the true g scores obtained on 
each occasion, g, and g, , are standardized so that the variance of g, equals 
that of g. . By fluctuation in function, we mean that the changes in true g 
scores between occasions (g. — g,) are not constant for all testees. If (g. — g,) 
is constant, then the function is stable. 

Mahmoud ([9], p. 130) refers to such function stability as person stability. 
It is admitted that person instability is probably a better phrase than function 
fluctuation, because unequal fluctuations in function is implied rather than 
fluctuations as such. Nevertheless, the term function fluctuation will be 
used since it has usually been used in the past to indicate this concept. 


Coefficient of Function Stability and of Function Fluctuation 


Define the coefficient of function stability, Rr; , as the ratio of stable 
variance in the general factor to variance in a factor general to the same tests 


75 








76 PSYCHOMETRIKA 


given on a single occasion. The coefficient of function fluctuation, Rrr , 





may be defined as 1 — Res . Thus, 

V, Vv, 
(1) Rrs = 1— Rrr = .-s 
where V, = the variance of s, the stable part of g, and g, . 


Now suppose that, for each person tested, there is a series of true g 
scores, each g score being obtained on a different occasion. Then, for a person 
1 
2) Yip = Be + di» ’ 


where g;, = g score of person 7 on occasion p, 
s;, = stable score of person 2, 
d;, = score of person 7 associated with occasion p. 
On each occasion, a set of g scores will be obtained. We may postulate 
that these sets of g scores are all parallel to each other. Then, if s; is defined as 


k 
:® Jip 


3) s, = lim =— 
( —_ 
Gulliksen ({7], pp. 28-31) has shown that 
(4) - = wn ’ 
where V, = variance of stable scores, 
V, = variance of the set of g scores obtained on any one occasion, 
r,., = correlation between any two such sets of g scores. 
Thus, to consider two such sets (or occasions), 
. V, # 
(5) Sess r —_ Vv _ Rrs =]- Rer . 
91 92 


Neither Ry nor r,,,, can be negative. If Rp; = 0, Rr = 1, and function 
fluctuation is at a maximum. It should be noted that g, and g, refer to true 
g scores. Thus, Rey and R,ys are independent of errors of measurement and, 
therefore, they indicate the extent to which function fluctuation, as such, 
limits the accuracy of psychological prediction. 

In order to measure r,,,, , and accordingly Rp; and Rr, , the plan of 
using a number of different, not parallel tests, will be adopted. At least two 
tests must be given on one occasion and at least two other tests on a subsequent 
occasion. Hence the number of tests must be four or more. No test is given 
twice, but the same testees take all the tests. This plan differs from that of 
Thouless [12], who suggests giving two tests twice. It also differs from Dunlap’s 
[3] plan of using four parallel tests given on two occasions. 

An essential part of the proposed plan is that the tests must be chosen 








R. F. GARSIDE 77 


so that, when all the tests are given to a separate group of testees on one 
occasion, they measure one general factor and no group factors. Whether the 
intercorrelations so obtained are consistent with this requirement may be 
ascertained by carrying out a factor analysis or calculating tetrad differences 
and applying the appropriate tests of significance. An exact test of the 
significance of tetrad differences has been given by Wishart [14]. In our 
design, the tests given on the first occasion must not be parallel to those 
subsequently given, unless all the tests are parallel to each other. Their means, 
standard deviations, reliability coefficients and specific factor loadings may 
all differ from test to test. 

Strictly speaking, the tests given at the same occasion should be ad- 
ministered simultaneously. This may be achieved by combining the tests 
into a composite test, each subtest providing items in rotation. It should be 
remembered, however, that such an arrangement is sound only if the tests 
are power rather than speed tests. If speed is an important factor, the tests 
must be given separately. 

To simplify the derivation, consider the case when only four tests are 
used. The derivation may easily be extended to cover five or more tests. 
If A, B, C, and D represent true scores of the tests and if A and B are obtained 
at the first occasion and C and D at the second testing then, since the general 
factor, g, is the sole source of correlation between the tests, 


(6) TaB = Tagl Bo. 
and 
(7) Too = lca." Dos - 


But g; is the sole source of correlation between g, and A or B. Therefore 


(8) Tac = Tao," o.02" Cos » 
(9) Tap = Tag." o.9." Doo » 
(10) Tac = TBo."o.021 Cos » 
and 

(11) fap = TBe.7e.0s4Dos * 


Substituting (5) in (8), (9), (10), and (11) and multiplying, 


(12) 4 Taclavlacl sp | 


9 Rae BERL S Giaie aga 
Tag. Bo.) Co." Dos 


Substituting (6) and (7) in (12), 


4 TacTavl scl Bp 
(13) Rrs = ae y 
Taslcp 











78 PSYCHOMETRIKA 


Multiplying numerator and denominator of (13) by the variances of A, B, 
C, and D, 


a ng = Lisl yrCyclnn 
‘/ABY CD 





where C indicates covariance. 
If it is assumed that errors of measurement are uncorrelated with one 


another or with true scores, then the covariance between the true scores of 
any two tests equals the covariance between the obtained scores. Thus (14) 
becomes 
~ CORE OE RE AR (rac adse¥'va)* 
(15) Rrs = ; a 3 
(CarC 2a) (TasTea) 





where a, b, c, and d refer to obtained scores. 

Should r,.?ac?sc%sa OF Tes?'eqa be negative, it merely means that the test 
scores of one or more tests have been inverted. Equation (15) is similar in 
form to Yule’s attenuation formula (Spearman [11], p. 294). The coefficient 
of function fluctuation, Rr , is given by 


(16) R Fame (Crs ia i ate = (rartea)* ee (raced? sePsa)* 
; sis (Cus? (restea)? 





If five tests are used a similar derivation gives 


= 6 1130 yal 151 231 241 25 
(17) Rrs = 3 
112134135145 


where tests 1 and 2 are given on the first occasion and tests 3, 4, and 5 ona 
subsequent occasion. There is no difficulty in deriving Rs for six or more tests. 


Mean of Rrs and of Rrr 


The question now arises as to whether Rys and Rr,y , the mean values 
obtained from samples, provide unbiased estimates of Ry; and Ry, , the 
population parameters. Wishart ([14], pp. 184-185) has shown that, when NV 
is large, both C,.CasCs-Cra and C,,C.4 approach the corresponding population 
parameters. Thus Rrs and Re, provide satisfactory estimates of Rrs and 
R,,, respectively, when N is large. 


Significance of Rrr 


If the function tested fluctuates between testings, then the intercorre- 
lations between tests will reflect not only a general factor, but also group 
factors associated with occasions. This was pointed out by Dunlap ([3], 
p. 448). Thus the significance of Ryy may be tested by simply ascertaining 
the significance of the appropriate tetrad differences in the usual way (Wishart 





e 
of 


it 


> 


Om = mw 4 








R. F. GARSIDE 79 


[14]). When four tests are given, these differences are rasrea — Tacfsa and 
Tasca — Taal oe + 

It is therefore unnecessary to derive the standard error of Rp, or of Res . 
If, however, the standard error of Rs is required, it may easily be derived 
by taking logarithmic differentials (Kelley [8], p. 526) and by using Wishart’s 
[13] moments. These are reported by Kelley ([8], p. 555). 


Bi-factor Analysis 


It has been suggested that a bi-factor analysis carried out on tests given 
on different occasions would indicate the extent of function fluctuation. 
Such an analysis has, in fact, been carried out by Ferguson [5]. He gave three 
parallel tests to the same group of testees, one test being given on each of 
three occasions. He then calculated the fifteen correlations between the 
halves of each test and carried out a bi-factor analysis. He concluded that, 
“Tt is not unlikely that both the correlation of errors and functional varia- 
bility are exerting a positive influence on the size of the group factors, and 
since no method of determining the relative importance of these two influences 
is at the moment apparent, it is only possible to describe these factors as 
factors of temporal contiguity.”” But when a bi-factor analysis is carried 
out on correlations among tests designed and administered as described in 
this paper, then the size of the group factor loadings will be affected only by 
function fluctuation. 

For the sake of simplicity, again consider the case of four tests only, 
even though this number of tests would be, of course, insufficient to carry 
out a satisfactory factor analysis. It is assumed that when the four tests 
are given at the same time, they measure a general factor but no group 
factors. Thus, when the tests are given in pairs on two different occasions 
and a bi-factor analysis is carried out, two group factors associated with 
occasions and a general factor will be obtained. 

Note that it is sometimes supposed that it not possible to carry out a 
bi-factor analysis with two group factors only, unless there is at least one 
test included which involves neither group factor but the general factor only. 
But Burt ({2], p. 56) has indicated a method whereby a bi-factor analysis 
may be carried out when every test has a factor loading on one or the other 
of the two group factors. 

According to our definition of the coefficient of function stability, it 
equals the ratio of the proportion of test variance attributable to the general 
factor to the proportion attributable to both general and group factors. If 
sampling errors are ignored, then this ratio will be constant for all tests, 
since they measure the same general factor. Thus, 


2 2 2 2 
Po Tee Je Ga 
18 R = i re i a, ee 7 a eos 
a LR KR HR Ce’ 

















80 PSYCHOMETRIKA 


where g. , gs , g- , and g, are the general factor loadings of the four tests, p, 
and p, are the first group factor loadings, and g, and q, the second group 
factor loadings. Therefore 





oe GGG 
19) aH PNG + PN(Ge + G)(Ga + Ma) 
Ga99e9a 





(929: + geps + gip, + prpid(G29g3 + 92a + 93g: + 9292) 
But, from (18), 


(20) JaPt = YoPa » 

and therefore, 

(21) 9:Ps + GPs = 2GaGsPaPr - 
Similarly 

(22) Geqa + gage = 29-9aGeGa - 


Substituting (21) and (22) in (19), 
2: Ga9sJeJa ; 
(gags + PoP) (Gega + GeGa) 


If scores a, b, c, and d are obtained as indicated previously, and if sampling 
errors are again ignored, then 





(23) Ris 


(24) Tob = JoJo + DaPo ; 
(25) Pea = GoJa 1 WeQa , 
(26) Tac = JaQJe » 

(27) Tad = JaGa » 

(28) Toe = JoGe » 

and 

(29) Toa = JoGa - 


Therefore, substituting (24) to (29) in (23) 
, g ) ’ 


a 
‘ 
(TaD aa bend) 


” Re 


Equations (30) and (15) are identical. Therefore, apart from possible differ- 
ences arising from sampling errors, the method proposed in a preceding 
section and bi-factor analysis provide equal estimates of the coefficients of 
function stability and fluctuation. It can be shown, in a similar manner, 





Pe 
yup 


ng 








R. F. GARSIDE 81 


that this is also true when more than four tests are used. But the proposed 
method is simpler to carry out. 


Comparison with other Coefficients 


Paulsen [10] suggested correcting the retest reliability coefficient for 
attenuation due to test error using the split-half reliability coefficient as 
the correction factor. The coefficient obtained by this procedure will measure 
function stability, but Paulsen called it the coefficient of ‘‘trait variability.” 
This coefficient is essentially similar to the proposed coefficient, Ry; . The 
proposed coefficient, however, would seem to be superior in that it utilizes 
more information from the same amount of testing and does not involve the 
split-half reliability, which does not always provide a satisfactory measure 
of test error. 

Thouless [12] suggests using two tests twice in order to test for and 
measure function fluctuation. In our notation tests a and c would be the 
same test administered at different times and so would be tests b and d. 
Thouless seems to mean the same as we do by function fluctuation and, in 
fact, points out that if 
(31) false ~ Tahie > 0, 
then function fluctuation exists. This tetrad difference is the same as one of 
the pair used in testing for function fluctuation. But Thouless considers that 
this purpose may be more simply achieved by calculating r—-)¢s-a) . If 
this correlation is positive, then function fluctuation exists. 

To obtain his index of function fluctuation, Thouless divides r(4-.)¢s-a) 
by the mean of r,, and r.4 . Accordingly 
=e 2F (a—e) (b=a) 

2 Tor + Tea 

where Jy» is Thouless’s index of function fluctuation. Thouless assumes 
that the standard deviations of a and c and of b and d are equal. He thus 
obtains 


(32) Foy 


a fe + Ten ~~ Tet 7" Vee ane 
(ras + Tea) Vil — fel — Tra) 

Ir cannot be directly compared with Rrr , since the latter is derived 
from four separate tests having no group factors. If the same two tests are 
given twice, (8) and (11) will no longer hold. It is possible, however, to derive 
a coefficient, R,, , similar to Rrr , using Thouless’s experimental design. For 
(6), (7), (9), and (10) will still apply to the data obtained. Thus Ri, may be 
derived in a manner similar to that of Rye : 








(33) Trr 


sn (rast ea)? ard (radt'oe)* 


~ os (raves) 








82 PSYCHOMETRIKA 





Apart from the factor W(1 — r,.)(1 — rsa) , Irr only differs from R%» in 
that J, is a function of arithmetic means of pairs of correlation coefficients 
whereas R,, is a function of their geometric means. But test a is the same as 
test c, and test b is the same as test d. Therefore, within the limits of sampling 


error, 





(35) Tab = Vea 

and 

(36) ek Pee 

Thus 

(37) Tre V(1 — Tec)(1 — Tea) = Ror 





In practice, the factor (1 — r,.)(1 — 14) will be less than unity and 
will therefore make J; greater than Rf, ; it appears to be an unnecessary 
complication. Moreover, as Mahmoud ([9], p. 131) remarks, Thouless’s 
index gives results which seem too high. 

Mahmoud [9] considers the case where several tests are given and then 
repeated in the same or parallel form. He derives a coefficient of person 
stability, which may be calculated from any number of tests. In the case of 
two tests only (i.e., four applications) his coefficient reduces to ([9], p. 129, 
equation xvii) 

Toa t Toe 

te + ¥0. 

where a is parallel to c and b is parallel to d. Rs» cannot be compared directly 
with Ry; because Mahmoud uses parallel tests. But a coefficient R4, may 
be derived, in the same way as Rj, , which will be comparable to Rgp , 


(38) Rsp = 


, (rads )} 
39) a ee aa Cc P 
( ‘ vied (rasea)* 


The correlations r,, and r,, , and also r,, and r,, , will again be approximately 
equal. Therefore R;s will give similar results to those of Rs» , within the 
limits of sampling errors. The proposed coefficient Ry; , however, seems to 
provide a more direct indication of the extent to which prediction is limited 
by function fluctuation. Moreover, by avoiding the use of parallel tests, Rrs 
utilizes more information from the same amount of testing than does Rp . 
It is interesting that giving the same tests twice, or using parallel tests, seems 
to be a disadvantage in measuring function fluctuation. 

Mahmoud ([9], p. 129) states that Rsp “measures the extent to which 
the relative abilities of a given set of persons, assessed on two or more separate 
days, have remained the same, in spite of the interval between the two appli- 
cations or (particularly if the interval is short) in spite of the variations in the 
conditions that obtained.’’ In order, therefore, to obtain a coefficient of trait 





nd 


8 


he 
to 
ed 


FS 
ns 
ch 


li- 
he 
Lit 





R. F. GARSIDE 83 


variability, Ryy , Mahmoud subtracts Rsp , not from unity, but from his 
coefficient of internal consistency. This coefficient depends upon errors of 
measurement, and therefore so does Rry . The proposed coefficient, Rrs , 
is independent of such errors and for our purpose, therefore, would seem to 
be more appropriate than Pry . It is true that variations in conditions may 
tend to reduce Rys , but this effect may be minimized by careful test ad- 
ministration. 


Example 


For an example, some of Mahmoud’s data ([9], p. 121, Table II) will be 
used: 7,, = .713, ro. = .881, Taz = .637, 15. = .559, 144 = .670, and r.g = .735 
(N = 87). From these data, Thouless’s coefficient Jy- = .878. Without the 
factor V/(1 —1,.)(1 — tsa), Ire would equal .174. It is evident that this 
factor has a considerable effect, making J7y much greater than R}, , which 
equals .176. 

Mahmoud’s Rsp = .826, and the coefficient R;; = .824. The results 
obtained from Rsp and Rs are very similar. The proposed coefficients 
Rys and Ry, , however, include more information from a given amount of 
testing than does Rs p , and their derivation is more direct than that of Rsp. 
Moreover, the proposed coefficients do not entail giving the same tests 
twice or the use of parallel tests. 


REFERENCES 


{1] Anderson, C. C. Some simple methods of testing for function fluctuation. Brit. J. 
Psychol., 1955, 46, 1-12. 
{2] Burt, C. Group factor analysis. Brit. J. statist. Psychol., 1950, 3, 40-75. 
[3] Dunlap, J. W. Comparable tests and reliability. J. educ. Psychol., 1933, 24, 442-453. 
[4] Editorial note. Brit. J. Psychol., 1955, 46, 230. 
[5] Ferguson, G. A. A bi-factor analysis of reliability coefficients. Brit. J. Psychol., 1940, 
31, 172-182. 
Finney, D. J. A note on the measurement of performance fluctuation. Memorandum 
to the National Foundation for Educational Research in England and Wales, 1953. 
[7] Gulliksen, H. Theory of mental tests. New York: Wiley, 1950. 
[8] Kelley, T. L. Fundamentals of statistics. Cambridge: Harvard Univ. Press, 1947. 
[9] Mahmoud, A. F. Test reliability in terms of factor theory. Brit. J. statist. Psychol., 
1955, 8, 119-135. 
[10] Paulsen, G. B. A coefficient of trait variability. Psychol. Bull., 1931, 23, 218-219. 
[11] Spearman, C. Correlation calculated from faulty data. Brit. J. Psychol., 1910, 3, 
271-295. 
[12] Thouless, R. H. Test unreliability and function fluctuation. Brit. J. Psychol., 1936, 
26, 325-343. 
[13] Wishart, J. The generalized product moment distribution in samples for a normal 
multivariate population. Biometrika, 1928, 20A, 32-52. 
[14] Wishart, J. Sampling errors in the theory of two factors. Brit. J. Psychol., 1928, 
19, 180-187. 
Manuscript received 11/19/56 


Revised manuscript received 4/1/57 





(6 


— 




















PSYCHOMETRIKA—VOL, 23, NO. 1 
MARCH, 1958 


PREDETERMINATION OF TEST WEIGHTS 


Paut J. HorrMan 


THE STATE COLLEGE OF WASHINGTON* 


Derivations are presented relating the length of a test to its weight in 
a composite. Tests of varying length are constructed so that their weights 
will be of predetermined magnitudes, and the results compared with expecta- 
tions. Weighting schemes involving standard deviations of raw scores and of 
true scores are compared. An important secondary derivation is presented 
from which it is possible to estimate test reliabilit Tnowlen only the relative 
length of a test, its shortened form, and the standard deviation of each. 


Given test A with known variance and reliability, one frequently wishes 
to construct a second test, B, such that the relative weights of the two tests for 
determining a composite score will be of some predetermined magnitude. 
Where test B can be experimentally pretested, item analysis procedures 
designed to control the standard deviation and reliability of the test can be 
applied ([1], pp. 375-380). If item parameters cannot be obtained in advance, 
the usual practice is to construct test B without regard to the problem of 
weighting and to apply some transformation to the scores after the test is 
administered and the test parameters determined. 

In many applications, and particularly in the classroom, the person 
responsible for evaluation is not prepared to engage in what seems to him to 
be high-powered statistical manipulations. What is wanted is a way of arriving 
at a composite for each individual member of his class by simply totaling 
the various part scores. For this reason, an attempt is often made to pre- 
determine weights by controlling the number of items in each test. It has 
been shown ([1], pp. 336-341) that the number of items in a test is not a 
necessary determinant of test weight, a fact which might appear to rule out 
this possibility as a solution. It is not known, however, precisely how the 
number of items is likely to affect test weights. Since practical people may 
well continue to justify its use in the lack of strong evidence to the contrary, 
it becomes important to determine the conditions under which weighting 
by controlling the number of items in a test may be successfully employed, 
and the conditions under which it may not. 

The matter is somewhat complicated since the concept of test weight is 
itself not clearly defined. There are a variety of suggestions for equalizing 
the contributions of two or more tests in the absence of a criterion ([3], pp. 
211-213; [4], pp. 88-90) and some suggestions for determining whether a 
given test contributes more than or less than another [5]. Each method implies 

*Now at University of Oregon. 


85 








86 PSYCHOMETRIKA 


a somewhat unique definition of weight. It is not our purpose to re-examine 
the problem of the meaning of test weights. Instead, we consider two defini- 
tions of test weight and develop the methods for their predetermination on 
the basis of length of test. 


Weighting by Standard Deviations 


It is often assumed that the effective weight of a test in relation to 
another is determined by the ratio of the standard deviations of the two 
tests. Thus, if test X has a standard deviation oc, and test Y a standard 
deviation o, , the weight of Y in relation to X is given by W, = o,/o,. 

Now let us assume X is a test of unit length, and that Y is a test of 
increased length, such that, in deviation scores, y = 2% + 2% +++: +%. 
Then 


Ai cec > (@, + 2 + see of at,)? 
: N 





(1) 


k 


o o,+> 2 Vuitem:, ; (i ¥ j). 


i=1 f=1 j=1 


Il 


If it is assumed that the components of Y are parallel forms, one may 
substitute as follows: 


Oz, = Gz ; Voir; —_ Vez ’ 
so that from (1), 
o, = ko? + k(k - 1)r,.0? . 








But 
2 2 " ae 2 
Ww? J oy is ko: + aa 1)r,.0; _ k 4. k(k a 1)r.2 ; 
Oo: CO: 
Therefore, 
(2) W, = Vk+ kk — Drs. 


From (2) it is seen that the effective weight of a test varies directly 
as a function of test length and reliability. If the reliability of the unit test 
is 1.00, 


W, = Vk+ kk — 1) 
(3) =Vk+kh—-k; 
W, =k. 


If the reliability of the unit test is zero, 
(4) W, = Vk. 





1e 
\i- 


to 
vO 
rd 


of 


j). 


ay 


tly 
ast 





PAUL J. HOFFMAN 87 


Considering (3) and (4), the inequality 
ViSW,<k 


makes the dependence of test weight upon length obvious. 

Our main concern, however, is that of finding a value for k that will 
result in a predetermined weight W, . To solve (2) for k, first square both 
sides: 


Il 


Wi =k+kk — Ir. 


=k + k’r., — kz, . 
Arranging terms in quadratic form, 
reak? + (1 — 1r2)k — W2 = 0; 
(6) be ae vo — ra) + 4rW? 
T rz 


Since a negative radical leads to k < 0, only one root is meaningful: 


V(L — 1,2) + 47..W, — (1 — riz) 
2r.. : 


From (7), one can estimate the relative length, of a test that is required in 
order to yield a given weight with respect to the unit test. 

Example: Assume that the cumulated scores for an individual to the 
end of the semester comprise a total of 100 test items. The reliability of the 
cumulation is .70. It is desired to construct a final examination which will 
equal twice the weight of the other tests. In this example, 











(7) k= 





te = .70, W?2 = 4.00, 


/(.30)? + (4)(.70)(4.00) — 30 11.29 — 30. 
(2)(.70) re, 1.40 , 





k= 





k = 2.19. 


Then the number of items necessary on the final examination is given by 
100k = (100)(2.19) = 219 items. 

(It should be noted that the terms of (6) can be rearranged to yield an 
expression for r,, in terms of W, and k. Thus, 





tak —k) = Wi —k; 
oi ee. 
lo: = en Sok 


This formula for reliability of a shortened form of a test requires only the 
standard deviation of the initial test, the standard deviation of the shortened 
form, and their relative length.) 











88 PSYCHOMETRIKA 


Figure 1 is a nomograph from which k can be quickly determined for any 
given r,, and any desired W, . 

It should be emphasized that the derivation of W, depends upon one 
important assumption: that the components of Y are parallel forms of test X. 
For the development of aptitude tests this may impose no significant practical 





12.0 


11.0 
\ £5.0 
10.0 \ 4 
Ka Mi-t9)° +4 hxWy = (| —hax) 


\ 
9.0 
» PT 
8.0 W=4.0 z 







































































7.0 N 
~ lg 
6.0 i oe os 
K W-= 3.0 — 
re 
e fee 
3.0 MW: 2.0 > 
e ie, ee Oe 
20 ae 1.5 eee ee 
W=1.0 
1.0 



































.f0 .20 .30 .40 .50 .60 .70 .80 .90 1.00 


Vex 
FIGuRE 1 
Computing diagram for estimating the length of a test, Y, such that W, = o,/oz , where: 
W, = the desired weight of the test Y, 


Tzz = reliability of test X, 
K = ratio of estimated length of Y to length of X. 








PAUL J. HOFFMAN 89 


limitation, but for achievement testing the situation is different. Achievement 
testing at different stages of learning yields scores on individuals who may 
differ in their rate of learning. In addition, course content is not necessarily 
highly interdependent among its various stages. For these reasons it seems 
reasonable to doubt the comparability of two achievement tests separated 
by a period of learning, unless some empirical evidence can be offered to 
show that such a procedure makes little practical difference. We shall return 
to the empirical question in a later portion of the paper. 


Weighting by True Scores 


One major difficulty in assuming that the weight of a test is a function 
of its standard deviation is that tests of low reliability will necessarily have 
small standard deviations. Thus, scores of an unreliable test may be multiplied 
by a constant so as to increase the test’s standard deviation in relation to a 
second more reliable test. The composite score thus becomes contingent 
upon the more unreliable test. This difficulty has been acknowledged, ([2], 
pp. 385-396) but the proposals for overcoming it have been varied. A solution 
that meets this objection, and one which seems to make rational sense is 
to define test weight in terms of the ratio of the standard deviations of true 


scores. Thus, 
(8) W, = O1,/Ct, ° 

In what manner does test length affect test weight defined in this way? 
Let us regard the true score on test Y as composed of tests of unit length, 


X. In deviation scores, 
k 


(9) t= Doty. 


t=1 


Again assuming comparable forms among the components of Y, it 
follows that the ¢,, will be equal. Then (9) becomes 


t, = kt, , 
and 
of = YBN = ho?,. 
Solving for k, 
KP =ot,/oi,; k= 01,/o,. 
Substituting from (8), 
(10) k=W,. 


Equation (10) states that if test weight is defined as the ratio of the 
sigmas of true scores, increasing the length of the test by the proportion k 








90 PSYCHOMETRIKA 


increases its weight by k also. Thus, if one wishes to write a test that will 
count twice as much as a given test, he simply writes twice the number of items. 
This coincides with the intuitively justified practices of many teachers who 
have no knowledge of test theory. The practice can now be seen to be statis- 
tically justified, when the assumption of parallel forms is met. 

To obtain evidence concerning the accuracy with which estimates of W, 
can be made, scores were obtained from midsemester and final examinations 
in Introductory Psychology for a group of 54 college freshmen. Both examina- 
tions were multiple choice, the final consisting of 105 items. Only the first 
30 items of the midsemester examination were used. Successive portions of 
the final examination were scored, yielding totals for each individual for 
the first 30, 60, 75, 90, and 105 items. The successive scores are thus not 
independent, a fact which detracts from the meaningfulness of the comparisons 
but which does not invalidate them. These results are plotted in Figures 2, 
3, and 4. Figure 2 compares obtained values, W, = o,/c, , with values of W, 
estimated from (2). In this case, X is the 30-item midsemester examination, 
and Y is the final exam of varying length. Figure 3 differs from Figure 2 
only in that test X now consists of the first 30 items of the final examination, 











“ST oo Predicted 
20F @--—-@ Obtoined 3 
a 
Wy 157 
Lor i" 
05 1 i 1 ai 


30 45 60 75 
Test Length 
FIGURE 2 
Predicted and obtained weights of test Y. Predictions made on the basis of 30-item midterm 


examination. Test Y consists of accumulations of items of the final examination beyond 
the first thirty. 





- i O——oO Predicted 
20 @—--@ Obtained 
Wy 1.5 7 
Lor 
0.5 6 5 0 : 








30 45 60 75 
Test Length 
FIGurReE 3 
Predicted and obtained weights of test Y. Predictions made on the basis of first 30 items 


of final examination. Test Y consists of accumulations of items of the final examination 
beyond the first thirty. 








PAUL J. HOFFMAN 91 


and test Y is composed of the successive portions of this examination, not 
including the first 30 items. It can be assumed that the difference between 
these two figures is due to the fact that the assumption of comparable forms is 
more nearly met for the latter situation than for the former. 











| O- —-® Test X =!st 30 items of Final 
3.5F O--—o TestX = Ist 30 items of Midterm 
Predicted 
3.0-- 
25 P- 
Wy a PO / 
go 
a 
L5 a gar 
lO F al 
ad 

om 
05 L L L i ail 

30 45 60 75 90 


Test Length 
FIaureg 4 


Predicted and obtained weights of test Y. Test weight defined as ratio of true scores. 


Figure 4 shows the predicted and obtained values of W, when defined as 
a ratio of true scores, according to (8). In this case, the predicted values are 
exactly proportional to test length; hence the solid diagonal line represents 
these predictions. The actual obtained values for W, were in this case deter- 
mined by noting that of, = 7,,0, and o;, = 1,0; . Therefore, 


= eV ty, 
Oz Vr, z 


The reliabilities were estimated from the item data, using Kuder-Richardson 
Formula 20. As was apparent in a comparison of Figures 2 and 3, so too in 
Figure 4, the use of the first 30 items of the final examination as test X 
results in predictions which appear to be more accurate than those based 
on the midsemester examination. The necessity for satisfying the assumption 
of parallel forms seems again to be indicated. 

It should be emphasized that the definition of parallel forms, necessary 
to satisfy the assumptions of the equations derived in this paper, is one 
which demands only that the variances and intercorrelations be equal. We 
need not say that the intercorrelations are perfect, or even that they are 
high. To assume such identity would reduce the entire question of differential 
weighting to a triviality, except as it may lead to the maximization of the 
reliability of the composite or to the prediction of an external criterion. 





W, 








92 PSYCHOMETRIKA 


REFERENCES 


[1] Gulliksen, H. Theory of mental tests. New York: Wiley, 1950. 

[2] Horst, P. The prediction of personal adjustment. SSRC Bulletin No. 48, 1941. 

[3] Kelley, T. L. Interpretation of educational measurements. New York: World Book, 
1927. 

[4] Thurstone, L. L. The reliability and validity of tests. Ann Arbor, Michigan: Edwards 
Bros., 1931. 

[5] Wilks, S. S. Weighting systems for linear functions of correlated variables when there 
is no dependent variable. Psychometrika, 1938, 3, 23-40. 


Manuscript received 7/23/56 


Revised manuscript received 4/18/57 








RULES FOR PREPARATION OF MANUSCRIPTS FOR 
PSYCHOMETRIKA 


. Send manuscripts to the Managing Editor: 


LYLE V. JONES 
Psychometric Laboratory 
University of North Carolina 
Chapel Hill, North Carolina 


. Submit three typewritten copies of the manuscript. For original copy use heavy white 
typewriter paper, size 814 x 11. Double-space the lines, leave ample space around 
formulas, and allow wide margins for editorial work. 

. Accompanying the manuscript should be three copies of an Abstract of no more than 
100 words, outlining the contents of the paper. 

. Tables should be submitted with the manuscript in four copies. Prepare original copy 
of tables on electric typewriter, in a form suitable for photographic reproduction. The 
remaining three copies need not be prepared on an electric typewriter, but should 
adhere to the prescribed form. 


Tables are to be numbered with Arabic numerals and referred to in the text by number’ 
e.g., Table 2. The heading of the table should be centered. The word ‘‘Table,’’ on the 
first line of the heading, should appear in capital letters, e.g., TABLE 2. The title, 
double-spaced below the table number, should have initial letters of principal words 
capitalized. Titles should be short; if two lines are required they should be single- 
spaced. 


Double horizontal lines should separate the heading from the stubhead, a single line 
should appear between the stubhead and the body of the table, and a single line should 
appear at the bottom of the table. Footnotes referring to any part of the table should 
be single-spaced immediately below the table. Tables appearing in Psychometrika, 
1956, 21, 362-363 show a variety of examples in good form. 


For the electrically typed copy of tables, heavy white paper should be used, and no 
erasures should appear. Corrected entries may be pasted over errors using rubber 
cement. On this copy, closely related tables should be prepared or mounted on the 
same sheet in such a way that final copy will fit the journal page after reduction. If 
this results in a sheet size exceeding 814 x 11 inches, the use of mailing tubes is recom- 
mended. 


. Figures should be drawn only by an expert draftsman, about three times the size at 
which they will appear. They should be on plain white paper or tracing cloth in black 
India ink. They should be referred to in the text by number, e.g., Fig. 3. Each figure 
caption, including the figure number and a succinct title, should be typed on a separate 
sheet of paper. No such identification should appear on the front of the figure. On the 
margin of the back of the figure the author should write lightly his name and the figure 
number. In addition to the original copy of figures, three photographic reproductions 
or rough sketches of figures should be submitted with the manuscript. 

. Formulas should be numbered at the left margin with Arabic numerals in parentheses. 
Careful attention should be given to the punctuation of formulas, which ordinarily 
are to be regarded as parts of sentences, Formulas should be legible, and unfamiliar 
symbols avoided if possible. Where they are used for the first time, they should be 
defined in the margin, as “upper case Greek letter gamma.” For very complicated 
notations, a list for the use of the printer should be submitted. 


93 





PSYCHOMETRIKA 


. Footnotes to the text should be reduced to a minimum. Formulas in footnotes should 
be avoided. Footnotes should be indicated by the following symbols: *(asterisk), 
+(dagger), t(double dagger), §(section mark), ||(parallels), (paragraph mark). Foot- 
notes should be typed at the bottom of the page oftext to which they refer. 

. References should be segregated at the end of the article. The heading should be “‘Refer- 
ences” not “Bibliography,” and should be capitalized and centered. The references 
in such a list should be arranged in alphabetical order according to author’s name, 
and numbered with Arabic numerals in brackets. In the text references and pages 
should be referred to by number: [2], [2, 6, 10], (ef. 3}, [e.g., 4, 6], ((2], p. 36), (cf. [2], 
[5], p. 20, eq. 13). 


With only minor exceptions, the forms of citation adopted by the Board of Editors 
of The American Psychological Association are used in Psychometrika. (See American 
Psychological Association, Council of Editors. Publication manual of the American 
Psychological Association, 1957 revision. Washington, D. C.: American Psychological 
Association, 1957.) The form for a journal reference is as follows: 


{1] Gulliksen, H. and Tucker, L. R. A mechanical model illustrating the scatter dia- 
gram with oblique test vectors. Psychometrika, 1951, 16, 233-238. 


The form for a book reference is as follows: 


[6] Thurstone, L. L. Multiple-factor analysis. Chicago: Univ. Chicago Press, 1947. 

. A separate sheet giving the title of the article and the author’s name and professional 
connection should be included with the manuscript. The author’s name or professional 
connection should not appear on the manuscript. There should be no reference which 
would identify the author of the manuscript, e.g., ‘In previous work [14], the present 
writer has shown that .. .’”’ Since all such statements must be removed before the 
article is sent to the editors, it will facilitate work in the editorial office if these pre- 
cautions are observed. 


. The author is urged to give careful attention to grammatical construction, spelling, 
and punctuation 


. The journal will provide 100 free offprints of each article. Additional offprints will be 
available in accordance with the following schedule: 

Add. 

2 pp. 4 pp. 8 pp. 12 pp. 16 pp. 2 pp. 

100 copies $4.00 $8.00 $12.00 $16.00 $20 .00 $2.00 

Each additional 100 $2.00 $4.00 $ 6.00 $ 8.00 $10.00 $1.00 


A blank page counts as one page. Covers: $12.00 first hundred, $5.00 each additional 
hundred. 


SH plieets eu 


oi 


wees 


Gere 














